CN111722833B - SM2 algorithm parallel modular multiplier - Google Patents

SM2 algorithm parallel modular multiplier Download PDF

Info

Publication number
CN111722833B
CN111722833B CN202010557989.1A CN202010557989A CN111722833B CN 111722833 B CN111722833 B CN 111722833B CN 202010557989 A CN202010557989 A CN 202010557989A CN 111722833 B CN111722833 B CN 111722833B
Authority
CN
China
Prior art keywords
multiplier
partial product
input value
machine controller
state machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010557989.1A
Other languages
Chinese (zh)
Other versions
CN111722833A (en
Inventor
陈付龙
刘扬
李宗平
张亭亭
谢冬
沈展
齐学梅
程桂花
徐晟�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Normal University
Original Assignee
Anhui Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Normal University filed Critical Anhui Normal University
Priority to CN202010557989.1A priority Critical patent/CN111722833B/en
Publication of CN111722833A publication Critical patent/CN111722833A/en
Application granted granted Critical
Publication of CN111722833B publication Critical patent/CN111722833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/722Modular multiplication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/45Structures or tools for the administration of authentication
    • G06F21/46Structures or tools for the administration of authentication by designing passwords or checking the strength of passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the invention provides a parallel SM2 algorithm modular multiplier, belonging to the technical field of data encryption calculation. The modular multiplier comprises: at least one modified multiplier and a finite state machine controller. The SM2 algorithm parallel modular multiplier provided by the invention executes parallel multiplication by adopting an improved multiplier consisting of a preprocessing circuit, a multiplexer, a multiplier and a post-processing circuit, and executes modular multiplication and reduction segmentation in parallel by utilizing the characteristic that a large number multiplication karatsuba algorithm and a national password administration recommended prime number P256 reduction algorithm are based on divide and conquer, so that the speed is improved, and the consumption of resources is reduced.

Description

SM2 algorithm parallel modular multiplier
Technical Field
The invention relates to the technical field of data encryption computation, in particular to an SM2 algorithm parallel modular multiplier.
Background
With the popularization of computers and the development of internet technology, society has entered an informatization age, and information communication plays an increasingly important role, so that it is a primary concern for users that information security is not stolen. In order to ensure the security of the information during the transmission of the common channel, the information needs to be encrypted. The advantage of the key shortness of the public key cryptographic algorithm makes the related public key cryptographic algorithm more and more popular, and the elliptic cryptographic algorithm is an algorithm with the characteristics of short key, small calculated amount and high speed under the same security level compared with other public key algorithms in the public key cryptographic algorithm. The SM2 algorithm is an elliptic curve encryption algorithm which is independently researched and developed in China, the hardware implementation is simpler, the method is suitable for some scenes with high requirements on speed, and meanwhile, the domestic cryptographic algorithm SM2 has significance of updating and popularizing the current public key cryptographic algorithm. Therefore, the domestic cryptographic algorithm SM2 has wide market prospect.
The large digital-to-analog multiplication operation is used as one of the core operations of the elliptic curve algorithm, and affects the efficiency of algorithm execution. The improvement of the speed of modular multiplication operation or the reduction of the consumption of resources while ensuring the speed has profound significance to the advanced execution of the algorithm, and the search for better modular multiplication implementation is also a hotspot problem for research by various research institutions.
Disclosure of Invention
The invention aims to provide an SM2 algorithm parallel modular multiplier which can complete modular multiplication operation of a large number under the condition of reducing resource consumption.
To achieve the above object, an embodiment of the present invention provides an SM2 algorithm parallel modulo multiplier, the modulo multiplier including:
at least one modified multiplier, the modified multiplier comprising:
a preprocessing circuit with a first end for receiving an input value A i A second end for receiving an input value A j The third terminal is used for receiving an input value B i The fourth terminal is used for receiving the input value B j
The first multiplexer is connected with the first end of the preprocessing circuit, the second end of the first multiplexer is connected with the fifth end of the preprocessing circuit, the third end of the first multiplexer is connected with the sixth end of the preprocessing circuit, and the fourth end of the first multiplexer is connected with the fourth end of the preprocessing circuit;
a multiplier, a first end of which is connected with a fifth end of the first multiplexer, and a second end of which is connected with a sixth end of the first multiplexer;
the first end of the post-processing circuit is connected with the third end of the multiplier, and the second end is used for outputting an output value A i B j +A j B i The third end is connected with the seventh end of the preprocessing circuit, and the fourth end is connected with the eighth end of the preprocessing circuit;
a second multiplexer, a first end of which is connected to a node between the multiplier and the post-processing circuit, a second end of which is connected to a fifth end of the post-processing circuit, a third end of which is connected to a seventh end of the first multiplexer, and a fourth end of which is used for outputting a calculation result;
a finite state machine controller, a first terminal for selecting the input value A i Input value A j Input value B i Input value B j The second end is connected with the seventh end of each first multiplexer and the third end of each second multiplexer, and the third end is used for controlling and adjusting the bit number arrangement of each result so as to obtain a final result.
Optionally, the number of the modified multipliers is 4.
Optionally, the multiplier is used to perform a normal multiplication or a karatsuba algorithm.
Optionally, the preprocessing circuit comprises a comparator and a subtractor.
Optionally, the post-processing circuit comprises an adder-subtractor and an exclusive-or gate.
Optionally, the finite state machine controller includes 11 states executed in sequence, wherein:
in the 0 state, all inputs and control signals of the finite state machine controller are 0;
in state 1, the finite state machine controller sends 4 en=0 signals to the modified multiplier to complete the first round of parallel multiplication to obtain partial product C 0
In state 2, the finite state machine controller multiplies the partial product C 0 Performing a first reduction operation, and sending 3 EN=0 signals and 1 EN=1 signal to the improved multiplier, thereby completing 4 parallel multiplications of a second round and obtaining a partial product C 1
In the 3 state, the finite state machine controller multiplies the partial product C 1 Performing a second reduction operation and sending 1 EN=0 and 3 EN=1 signals to the improved multiplier to complete the third round of 4-time parallel multiplication to obtain a partial product C 2 Sum partial product C 3
In the 4 state, the finite state machine controller multiplies the partial product C 2 Sum partial product C 3 Performing a third reduction operation and sending 4 EN=1 signals to the improved multiplier to complete the fourth round of 4-time parallel multiplication to obtain a partial product C 4
In the 5 state, the finite state machine controller multiplies the partial product C 4 Performing a fourth reduction operation, and sending 4 EN=1 signals to the improved multiplier to complete the fifth round of 4-time parallel multiplication to obtain a partial product C 5 Sum partial product C 6
In state 6, the finite state machine controller sums the partial product C 5 Sum partial product C 6 Performing a fifth reduction operation, and sending 4 EN=1 signals to the improved multiplier to obtain a partial product C by a sixth round of 4 parallel multiplications 7
In the 7 state, the finite state machine controller calculates the partial product C 7 Performing a sixth reduction operation, and sending 4 en=1 signals to the modified multiplier to perform a seventh round of 4 parallel multiplications to obtain a partial product C 8
In the 8 state, the finite state machine controller multiplies the partial product C 8 Performing a seventh reduction operation, and sending 4 EN=1 signals to the improved multiplier to perform an eighth round of 4-time parallel multiplication to obtain a partial product C 9 Sum partial product C 10
In the 9 state, the finite state machine controller calculates the partial product C 9 Sum partial product C 10 Performing an eighth reduction operation, and sending 4 EN=1 signals to the improved multiplier to obtain a partial product C by 4 parallel multiplications of a ninth round 11 To partial product C 15
In state 10, the finite state machine controller sums the partial product C 11 To partial product C 15 A ninth reduction operation is performed.
Optionally, the improved multiplier is configured to:
in the case of receiving a signal of en=1, a karatsuba algorithm multiplication is performed;
in the case where a signal of en=0 is received, ordinary multiplication is performed.
Optionally, the finite state machine controller is configured to:
when any state is executed, a sequence number needing modular multiplication is obtained from a preset multiplication allocation table;
acquiring the input value A from a preset 8-division karatsuba algorithm expression table according to the sequence number i Input value A j Input value B i Input value B j Is the number of (2);
selecting the input value A from the numbers to be multiplied according to the number i Input value A j Input value B i Input value B j To input a corresponding modified multiplier.
Through the technical scheme, the SM2 algorithm parallel modular multiplier provided by the invention executes parallel multiplication by adopting the improved multiplier formed by the preprocessing circuit, the multiplexer, the multiplier and the post-processing circuit, and the modular multiplication and the reduction segmentation are executed in parallel by utilizing the characteristic that the majority multiplication karatsuba algorithm and the national password administration recommended prime number P256 reduction algorithm are based on divide-and-conquer, so that the speed is improved, and the consumption of resources is reduced.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:
FIG. 1 is a block diagram of an SM2 algorithm parallel modulo multiplier according to one embodiment of the invention;
FIG. 2 is a block diagram of an improved multiplier according to one embodiment of the present invention;
FIG. 3 is a block diagram of an improved multiplier according to one embodiment of the present invention;
FIG. 4 is a flow chart of an operation of an improved multiplier according to one embodiment of the invention;
FIG. 5 is a flow chart of a manner in which a finite state machine controller FSM selects input values according to one embodiment of the present invention; and
fig. 6 is a partially superimposed exploded view of a karatsuba algorithm according to one embodiment of the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
In the embodiments of the present invention, unless otherwise indicated, terms of orientation such as "upper, lower, top, bottom" are used generally with respect to the orientation shown in the drawings or with respect to the positional relationship of the various components with respect to one another in the vertical, vertical or gravitational directions.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
A block diagram of an SM2 algorithm parallel modulo multiplier according to an embodiment of the invention is shown in fig. 1. In fig. 1, the modulo multiplier may include at least one modified multiplier mul_muti and a finite state machine controller FSM. The specific structure of the modified multiplier mul_muti may be, for example, as shown in fig. 2. In fig. 2, the modified multiplier mul_muti may include a pre-processing circuit pre, a first multiplexer mux1, a multiplier Muti, a post-processing circuit post, and a post-processing circuit mux2.
In FIG. 2, a first end of the preprocessing circuit pre may be used to receive an input value A i A second end can be used for receiving the input value A j A third terminal can be used for receiving an input value B i The fourth terminal may be used to receive an input value B j . The first multiplexer mux1 may have a first terminal connected to the first terminal of the pre-processing circuit pre, a second terminal connected to the fifth terminal of the pre-processing circuit pre, a third terminal connected to the sixth terminal of the pre-processing circuit prre, and a fourth terminal connected to the fourth terminal of the pre-processing circuit pre. The first terminal of the multiplier muti may be connected to the fifth terminal of the first multiplexer mux1 and the second terminal may be connected to the sixth terminal of the first multiplexer mux 1. A first terminal of the post-processing circuit post may be connected to the third terminal of the multiplier muti, and a second terminal may be used for outputting the output value A i B j + j B i The third terminal is connected to the seventh terminal of the pre-processing circuit pre, and the fourth terminal is connected to the eighth terminal of the pre-processing circuit pre. The first terminal of the second multiplexer mux2 may be connected to a node between the multiplier muti and the post-processing circuit post, the second terminal may be connected to the fifth terminal of the post-processing circuit post, the third terminal may be connected to the seventh terminal of the first multiplexer mux1, and the fourth terminal may be used to output the result of the calculation. A first end of the finite state machine controller FSM may be used to select the input value a i Input value A j Input value B i Input value B j The second terminal may be connected to the seventh terminal of each first multiplexer mux1 and to the third terminal of the second multiplexer mux2, which may be used for controlling the adjustment of the bit arrangement of each result to obtain the final result.
In one embodiment of the invention, the number of modified multipliers mul_muti may also be different for different numbers of bits. In this embodiment, in the case where the number to be modulo-multiplied is 256 bits, the number of the modified multipliers mul_muti may be 4.
In one embodiment of the invention, the multiplier muti may be a multiplier known to those skilled in the art for performing a normal multiplication or a karatsuba algorithm.
In this embodiment, the specific structures of the pre-processing circuit pre and post-processing circuit post may be, for example, as shown in fig. 3 (the multiplexer is omitted from fig. 3 because it mainly serves as a connection). In fig. 3, the preprocessing circuit pre may include a comparator and a subtractor. The post-processing circuit post may comprise an adder-subtractor add/sub and an exclusive or gate+. The comparator may include a first comparator cmp0 and a second comparator cmp1. The subtracter may include a first subtracter sub0 and a second subtracter sub1. A first end of the first comparator cmp0 may be used to receive the input value A i A second end can be used for receiving the input value A j The third terminal may be connected to the first terminal of the exclusive or gate+. A first end of the second comparator may be used to receive the input value B i A second end can be used for receiving the input value B j The third terminal may be connected to the second terminal of the exclusive or gate+. A first end of the first subtractor sub0 may be arranged to receive an input value a i A second end can be used for receiving the input value A j The third terminal may be connected to the first terminal of the exclusive-or gate + and the fourth terminal may be connected to the first terminal of the multiplier muti. A first end of the second subtractor sub1 may be arranged to receive an input value B i A second end can be used for receiving the input value B j The third terminal may be connected to the second terminal of the exclusive-or gate + and the fourth terminal may be connected to the second terminal of the multiplier muti. The third terminal of the multiplier muti may be connected to a first terminal of an adder-subtractor add/sub, the second terminal of the adder-subtractor add/sub may be connected to a third terminal of an exclusive-OR gate +, and the fourth terminal of the adder-subtractor add/sub may be used to output the output value A i B j +A j B i The fifth end can be used for outputting an output value A i B i +A j B j . In fig. 3, cmpA and cmpB are judgment signals, OP is an exclusive or value of cmpA and cmpB, and temp are intermediate values of a and B, respectively.
In this embodiment, the specific operation of the modified multiplier mul_muti may be, for example, as shown in fig. 4. In fig. 4:
with input value A i 、A j 、B i 、B j In the case of en=1, the first comparator cmp0 and the second comparator cmp1 are respectively for a i And A j 、B i And B j A comparison operation is performed. In A way i And A j For example, at A i >A j In the case of (1), cmpa=1, and the intermediate value temp=a outputted from the first subtractor i -A j The method comprises the steps of carrying out a first treatment on the surface of the Otherwise cmpa=0, intermediate value temp=a j -A i 。B i And B j Similarly, the description is omitted.
On the other hand, the multiplier Muti performs a multiplication operation on temp and temp, i.e., temp=temp×temp. Exclusive OR gate+ performs exclusive OR calculation on tempA and tempB, thereby obtaining op, namely: op=temp=temp. The adder-subtractor performs an operation on the op and temp to obtain an output value out. Specifically, in the case where op=1, out is a i B i +A j B j And temp; in the case of op=0, out is a i B i +A j B j And temp.
In the case of en=0, then temp=a i ,tempB=B i ,temp=tempA*tempB,out=temp。
In this embodiment, the finite state machine controller FSM may include 11 states that execute in sequence, wherein:
in the 0 state, all inputs and control signals of the finite state machine controller FSM are 0;
in state 1, the finite state machine controller FSM sends 4 en=0 signals to the modified multiplier to complete the first round of parallel multiplication to obtain partial product C 0
In state 2, finite state machine controller FSM pairs partial product C 0 Performing the first reduction operation, and sending 3 EN=0 signals and 1 EN=1 signal to the improved multiplier to complete the second round of 4-time parallel multiplication to obtain partial product C 1
In state 3, finite state machine controller FSM pairs partial product C 1 Execute the secondThe reduction operation is performed once, and signals of 1 EN=0 and 3 EN=1 are sent to the improved multiplier, so that 4 parallel multiplications of a third round are completed, and a partial product C is obtained 2 Sum partial product C 3
In state 4, finite state machine controller FSM pairs partial product C 2 Sum partial product C 3 Performing a third reduction operation, and sending 4 EN=1 signals to the improved multiplier to complete the fourth-round 4-time parallel multiplication to obtain a partial product C 4
In state 5, finite state machine controller FSM pairs partial product C 4 Performing a fourth reduction operation, and sending 4 EN=1 signals to the improved multiplier to complete the fifth round of 4-time parallel multiplication to obtain a partial product C 5 Sum partial product C 6
In state 6, finite state machine controller FSM pairs partial product C 5 Sum partial product C 6 Performing a fifth reduction operation, and sending 4 EN=1 signals to the improved multiplier to perform a sixth round of 4 parallel multiplications to obtain a partial product C 7
In state 7, finite state machine controller FSM pairs partial product C 7 Performing a sixth reduction operation, and sending 4 en=1 signals to the modified multiplier for a seventh round of 4 parallel multiplications to obtain a partial product C 8
In state 8, finite state machine controller FSM pairs partial product C 8 Performing a seventh reduction operation, and sending 4 EN=1 signals to the improved multiplier for the eighth round of 4-time parallel multiplication to obtain a partial product C 9 Sum partial product C 10
In state 9, finite state machine controller FSM pairs partial product C 9 Sum partial product C 10 Performing the eighth reduction operation, and sending 4 EN=1 signals to the improved multiplier to perform the ninth round of 4-time parallel multiplication to obtain a partial product C 11 To partial product C 15
In state 10, finite state machine controller FSM pairs partial product C 11 To partial product C 15 A ninth reduction operation is performed.
In this embodiment, in the case of receiving a signal of en=1, karatsuba algorithm multiplication is performed; in the case where a signal of en=0 is received, ordinary multiplication is performed.
In addition, since the input values of each parallel multiplication and subtraction operation are different, the manner in which the FSM selects the input values for the finite state machine controller may be in a variety of forms known to those skilled in the art. In a preferred example of the present invention, taking the example of a large number of 256 bits for both the number a and the number B, this selection may be, for example, the steps shown in fig. 5. Specifically, in fig. 5, the selection manner may include:
in step S10, when any one of the states is executed, the sequence number to be multiplied is acquired from the preset multiplication allocation table. In particular, the preset multiplication allocation table may be, for example, as shown in table 1,
table 1 multiplication allocation table
Figure BDA0002545053670000091
Figure BDA0002545053670000101
In step S11, an input value A is obtained from a preset 8-minute karatsuba algorithm expression table according to the selected sequence number i Input value A j Input value B i Input value B j Is a number of (3). Specifically, the 8-component karatsuba algorithm representation may be, for example, as shown in Table 2,
table 28-Table of the partial karatsuba Algorithm
Figure BDA0002545053670000102
In step S12, an input value a is selected from the numbers to be modulo multiplied according to the number i Input value A j Input value B i Input value B j To input a corresponding modified multiplier.
In this embodiment, in the case where the large integers a and B are 256 bits, the partial product stack exploded view of the above-described karatsuba algorithm is shown in fig. 6. C (C) 15 To C 0 The calculation formula of (1) is as formula (1),
C 0 =S 0 _l,C n =S n _l+S (n-1) _h+Cin (n-1) (n∈[1,15]), (1)
as can be seen from table 2, 36 multiplication calculations are required to complete the calculations for the large numbers a and B. In the case where 4 multipliers are used, a total of 9 times need to be performed, and specific multiplication allocation modes are shown in tables 1 and 2.
P 256 The prime numbers recommended by the national institutes of cryptography for use in applying the SM2 algorithm, the specific values are as follows, and can also be written in the form of formula (2).
p 256 =fffffffe ffffffff ffffffff ffffffff fffffff f00000000 ffffffff ffffffff
p 256 =2 256 -2 224 -2 96 +2 64 -1 (2),
Taking two large integers a and B of 256 as an example, to perform a modular multiplication operation, the following operations may be performed:
1. due to 2 256 modp 256 ≡2 224 +2 96 -2 64 +1(modp 256 ) Thus, there are:
2 288 modp 256 ≡2 256 *2 32 (modp 256 )≡2 224 +2 128 -2 64 +2 32 +1(modp 256 )
2 320 modp 256 ≡2 288 *2 32 (modp 256 )≡2 224 +2 160 +2 32 +1(modp 256 )
2 352 modp 256 ≡2 320 *2 32 (modp 256 )≡2 224 +2 192 +2 96 +2 32 +1(modp 256 )
2 384 modp 256 ≡2 352 *2 32 (modp 256 )≡2*2 224 +2 128 +2 96 +2 32 +1(modp 256 )
2 416 modp 256 ≡2 384 *2 32 (modp 256 )≡2*2 224 +2 160 +2 128 +2*2 96 -2 64 +2 32 +2(modp 256 )
2 448 modp 256 ≡2 416 *2 32 (modp 256 )≡2*2 224 +2 192 +2 160 +2*2 128 +2 96 -2 64 +2*2 32 +2(modp 256 )
2 480 modp 256 ≡2 448 *2 32 (modp 256 )≡3*2 224 +2 192 +2*2 160 +2 128 +2 96 +2*2 32 +2(modp 256 )
modulo p of each power of 2 256 The result of (2) is taken into formula (2), and is obtained:
C 8 *2 256 (modp 256 )≡C 8 *2 224 (modp 256 )+C 8 (2 96 -2 64 +1)
C 9 *2 288 (modp 256 )≡C 9 *2 224 (modp 256 )+C 9 (2 128 -2 64 +2 32 +1)
C 10 *2 320 (modp 256 )≡C 10 *2 224 (modp 256 )+C 10 (2 160 +2 32 +1)
C 11 *2 352 (modp 256 )≡C 11 *2 224 (modp 256 )+C 11 (2 192 +2 96 +2 32 +1)
C 12 *2 384 (modp 256 )≡2*C 12 *2 224 (modp 256 )+C 12 (2 128 +2 96 +2 32 +1)
C 13 *2 416 (modp 256 )≡2*C 13 *2 224 (modp 256 )+C 13 (2 160 +2 128 +2*2 96 -2 64 +2 32 +2)
C 14 *2 448 (modp 256 )≡2*C 14 *2 224 (modp 256 )+C 14 (2 192 +2 160 +2*2 128 +2 96 -2 64 +2*2 32 +2)
C 15 *2 480 (modp 256 )≡3*C 15 *2 224 (modp 256 )+C 15 (2 192 +2*2 160 +2 128 +2 96 +2*2 32 +2)
thus, the result of the multiplication of large integers A and B, C Mod p 256 Can be expressed as formula (3) or formula (4),
Cmodp 256 ≡3*C 15 *2 224 (modp 256 )+2*C 14 *2 224 (modp 256 )+2*C 13 *2 224 (modp 256 )
+2*C 12 *2 224 (modp 256 )+C 11 *2 224 (modp 256 )+C 10 *2 224 (modp 256 )
+C 9 *2 224 (modp 256 )+C 8 *2 224 (modp 256 )+C 7 *2 224 (modp 256 )
+2 192 (C 6 +C 11 +C 14 +C 15 )+2 160 (C 5 +C 10 +C 13 +C 14 +2*C 15 )
+2 128 (C 4 +C 9 +C 12 +C 13 +2·C 14 +C 15 )+2 96 (C 3 +C 8 +C 11 +C 12 +2*C 13 +C 14 +C 15 )
-2 64 (C 8 +C 9 +C 13 +C 14 )+2 64 C 2 +2 32 (C 1 +C 9 +C 10 +C 11 +C 12 +C 13 +2*C 14 +2*C 15 )
+C 0 +C 8 +C 9 +C 10 +C 11 +C 12 +2(C 13 +C 14 +C 15 )(modp 256 )
,(3)
in addition, the reduction operation may be assigned in the manner shown in Table 3,
TABLE 3 Table 3
Figure BDA0002545053670000121
The operations in the rows S8, S10, S19, and S20 are multiplied by 2, and the operations in the rows S13 to S16 are subtracted. The reduction and parallel multiplication may be performed in parallel, the specific manner of parallel operation may be as shown in table 4,
TABLE 4 Table 4
Figure BDA0002545053670000131
Namely:
Result=S1+S2+S3+S4+S5+S6+S7+S9+S11+S12+S17+S18
+2(S13+S14+S15+S16)-S8-S10-S19-S20(modp 256 ), (4)
through the technical scheme, the SM2 algorithm parallel modular multiplier provided by the invention performs parallel multiplication by adopting an improved multiplier consisting of a preprocessing circuit, a multiplexer, a multiplier and a post-processing circuit, and recommends prime numbers P by using a large number multiplication karatsuba algorithm and a national password administration 256 The reduction algorithm is based on the characteristic of divide and conquer, the multiplication of modular multiplication and the reduction are executed in parallel, the speed is improved, and the consumption of resources is reduced.
The optional embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the embodiments of the present invention are not limited to the specific details of the foregoing embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present invention within the scope of the technical concept of the embodiments of the present invention, and all the simple modifications belong to the protection scope of the embodiments of the present invention.
In addition, the specific features described in the above embodiments may be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the various possible combinations of embodiments of the invention are not described in detail.
Those skilled in the art will appreciate that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, including instructions for causing a single-chip microcomputer, chip or the like or processor (processor) to perform all or part of the steps of the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In addition, any combination of the various embodiments of the present invention may be made between the various embodiments, and should also be regarded as disclosed in the embodiments of the present invention as long as it does not deviate from the idea of the embodiments of the present invention.

Claims (8)

1. An SM2 algorithm parallel modulo multiplier, comprising:
at least one modified multiplier, the modified multiplier comprising:
a preprocessing circuit with a first end for receiving an input value A i A second end for receiving an input value A j The third terminal is used for receiving an input value B i The fourth terminal is used for receiving the input value B j
The first multiplexer is connected with the first end of the preprocessing circuit, the second end of the first multiplexer is connected with the fifth end of the preprocessing circuit, the third end of the first multiplexer is connected with the sixth end of the preprocessing circuit, and the fourth end of the first multiplexer is connected with the fourth end of the preprocessing circuit;
a multiplier, a first end of which is connected with a fifth end of the first multiplexer, and a second end of which is connected with a sixth end of the first multiplexer;
the first end of the post-processing circuit is connected with the third end of the multiplier, and the second end is used for outputting an output value A i B j +A j B i The third end is connected with the seventh end of the preprocessing circuit, and the fourth end is connected with the eighth end of the preprocessing circuit;
a second multiplexer, a first end of which is connected to a node between the multiplier and the post-processing circuit, a second end of which is connected to a fifth end of the post-processing circuit, a third end of which is connected to a seventh end of the first multiplexer, and a fourth end of which is used for outputting a calculation result;
a finite state machine controller, a first terminal for selecting the input value A i Input value A j Input value B i Input value B j The second end is connected with the seventh end of each first multiplexer and the third end of each second multiplexer, and the third end is used for controlling and adjusting the bit number arrangement of each result so as to obtain a final result.
2. The modulo multiplier of claim 1, wherein the number of modified multipliers is 4.
3. A multiplier as claimed in claim 2, characterized in that the multiplier is adapted to perform a normal multiplication or a karatsuba algorithm.
4. The modular multiplier of claim 1, wherein the preprocessing circuit comprises a comparator and a subtractor.
5. The modulo multiplier of claim 1, wherein the post-processing circuit comprises an adder-subtractor and an exclusive-or gate.
6. A modular multiplier as claimed in claim 3, in which the finite state machine controller comprises 11 states which execute in sequence, wherein:
in the 0 state, all inputs and control signals of the finite state machine controller are 0;
in state 1, the finite state machine controller sends 4 en=0 signals to the modified multiplier to complete the first round of parallel multiplication to obtain partial product C 0
In state 2, the finite state machine controller multiplies the partial product C 0 Performing a first reduction operation, and sending 3 EN=0 signals and 1 EN=1 signal to the improved multiplier, thereby completing 4 parallel multiplications of a second round and obtaining a partial product C 1
In the 3 state, the finite state machine controller multiplies the partial product C 1 Performing a second reduction operation and sending 1 EN=0 and 3 EN=1 signals to the improved multiplier to complete the third round of 4-time parallel multiplication to obtain a partial product C 2 Sum partial product C 3
In the 4 state, the finite state machine controller multiplies the partial product C 2 Sum partial product C 3 Performing a third reduction operation and sending 4 EN=1 signals to the improved multiplier to complete the fourth round of 4-time parallel multiplication to obtain a partial product C 4
In the 5 state, the finite state machine controller multiplies the partial product C 4 Performing a fourth reduction operation, and sending 4 EN=1 signals to the improved multiplier to complete the fifth round of 4-time parallel multiplication to obtain a partial product C 5 Sum partial product C 6
In state 6, the finite state machine controller sums the partial product C 5 Sum partial product C 6 Performing a fifth reduction operation, and sending 4 EN=1 signals to the improved multiplier to obtain a partial product C by a sixth round of 4 parallel multiplications 7
In the 7 state, the finite state machine controller calculates the partial product C 7 A sixth reduction operation is performed and 4 en=1 messages are sent to the modified multiplierThe number is multiplied by 4 times of parallel of a seventh round to obtain a partial product C 8
In the 8 state, the finite state machine controller multiplies the partial product C 8 Performing a seventh reduction operation, and sending 4 EN=1 signals to the improved multiplier to perform an eighth round of 4-time parallel multiplication to obtain a partial product C 9 Sum partial product C 10
In the 9 state, the finite state machine controller calculates the partial product C 9 Sum partial product C 10 Performing an eighth reduction operation, and sending 4 EN=1 signals to the improved multiplier to obtain a partial product C by 4 parallel multiplications of a ninth round 11 To partial product C 15
In state 10, the finite state machine controller sums the partial product C 11 To partial product C 15 A ninth reduction operation is performed.
7. The modular multiplier of claim 6, wherein the modified multiplier is configured to:
in the case of receiving a signal of en=1, a karatsuba algorithm multiplication is performed;
in the case where a signal of en=0 is received, ordinary multiplication is performed.
8. The modular multiplier of claim 6, wherein the finite state machine controller is configured to:
when any state is executed, a sequence number needing modular multiplication is obtained from a preset multiplication allocation table;
acquiring the input value A from a preset 8-division karatsuba algorithm expression table according to the sequence number i Input value A j Input value B i Input value B j Is the number of (2);
selecting the input value A from the numbers to be multiplied according to the number i Input value A j Input value B i Input value B j To input a corresponding modified multiplier.
CN202010557989.1A 2020-06-18 2020-06-18 SM2 algorithm parallel modular multiplier Active CN111722833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010557989.1A CN111722833B (en) 2020-06-18 2020-06-18 SM2 algorithm parallel modular multiplier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010557989.1A CN111722833B (en) 2020-06-18 2020-06-18 SM2 algorithm parallel modular multiplier

Publications (2)

Publication Number Publication Date
CN111722833A CN111722833A (en) 2020-09-29
CN111722833B true CN111722833B (en) 2023-06-02

Family

ID=72567371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010557989.1A Active CN111722833B (en) 2020-06-18 2020-06-18 SM2 algorithm parallel modular multiplier

Country Status (1)

Country Link
CN (1) CN111722833B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115202616A (en) * 2022-06-24 2022-10-18 上海途擎微电子有限公司 Modular multiplier, security chip, electronic device and encryption method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464920A (en) * 2008-12-10 2009-06-24 清华大学 Design method for automatic generation of two element field ECC coprocessor circuit
US9904512B1 (en) * 2013-05-31 2018-02-27 Altera Corporation Methods and apparatus for performing floating point operations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464920A (en) * 2008-12-10 2009-06-24 清华大学 Design method for automatic generation of two element field ECC coprocessor circuit
US9904512B1 (en) * 2013-05-31 2018-02-27 Altera Corporation Methods and apparatus for performing floating point operations

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于FPGA的F_P域模乘与模逆的设计与实现;杨博;孟李林;陶琼;;微电子学与计算机(第05期);全文 *

Also Published As

Publication number Publication date
CN111722833A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
JP5301989B2 (en) Elliptic curve point multiplication
US8374345B2 (en) Data processing system and data processing method
WO2020006692A1 (en) Fully homomorphic encryption method and device and computer readable storage medium
CN107004084B (en) Multiplicative mask for cryptographic operations
May How to meet ternary LWE keys
CN111722833B (en) SM2 algorithm parallel modular multiplier
US10326596B2 (en) Techniques for secure authentication
Badsha et al. Privacy preserving user based web service recommendations
JP2003513490A (en) Data processing method resistant to data extraction by analyzing unintended side channel signals
JP2002543667A (en) Effective key length control method and apparatus
US9594918B1 (en) Computer data protection using tunable key derivation function
JP2001505325A (en) Method and apparatus for implementing a decoding mechanism by calculating a standardized modular exponentiation to thwart timing attacks
WO2014169783A1 (en) Method for implementing precomputation of large number in embedded system
Bellare et al. Defending against key exfiltration: efficiency improvements for big-key cryptography via large-alphabet subkey prediction
CN111339562A (en) Order preserving/de-ordering ciphertext recovery method and device
CN114584285B (en) Secure multiparty processing method and related device
US7113593B2 (en) Recursive cryptoaccelerator and recursive VHDL design of logic circuits
WO2019120066A1 (en) Fast mode reduction method and medium suitable for sm2 algorithm
CN114826560A (en) Method and system for realizing lightweight block cipher CREF
US11343070B2 (en) System and method for performing a fully homomorphic encryption on a plain text
US9047167B2 (en) Calculating the modular inverses of a value
WO2020037565A1 (en) Modular inversion operation unit, modular inversion operation method, and security system
CN108390761B (en) Hardware implementation method of dual-domain modular inversion
WO2024091708A1 (en) Interleaved scalar multiplication for elliptic curve cryptography
CN115632759A (en) Encryption and decryption speed control method, unit and security chip circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant