CN114978473A - Processing method of SM3 algorithm, processor, chip and electronic equipment - Google Patents

Processing method of SM3 algorithm, processor, chip and electronic equipment Download PDF

Info

Publication number
CN114978473A
CN114978473A CN202210493013.1A CN202210493013A CN114978473A CN 114978473 A CN114978473 A CN 114978473A CN 202210493013 A CN202210493013 A CN 202210493013A CN 114978473 A CN114978473 A CN 114978473A
Authority
CN
China
Prior art keywords
word
message
round
state
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210493013.1A
Other languages
Chinese (zh)
Other versions
CN114978473B (en
Inventor
姚涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202210493013.1A priority Critical patent/CN114978473B/en
Publication of CN114978473A publication Critical patent/CN114978473A/en
Application granted granted Critical
Publication of CN114978473B publication Critical patent/CN114978473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/12Details relating to cryptographic hardware or logic circuitry
    • H04L2209/125Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Power Engineering (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The embodiment of the application provides a processing method, a processor, a chip and electronic equipment of an SM3 algorithm, wherein the method comprises the following steps: reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand comprises a plurality of message words and message parameters, and a bit width of the first source operand is the same as a bit width of a round of computation results; according to a single round calculation instruction, calculating a state word of a next state by using a state word of a previous state and a corresponding message word and a corresponding message parameter of a current round; and storing the state word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of the round calculation result. The processing speed of the SM3 algorithm can be improved.

Description

Processing method of SM3 algorithm, processor, chip and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of cryptography, in particular to a processing method, a processor, a chip and electronic equipment of an SM3 algorithm.
Background
The SM3 algorithm is a cryptographic hash function standard that can be used for lengths less than 2 64 A bit message produces a 256-bit hash value, which may be a message digest (bit string) output by a hash algorithm on a message. The SM3 algorithm is essentially a cryptographic hash algorithm, and can be used in scenarios requiring cryptographic security, such as digital signature and verification, generation and verification of message authentication codes, and random number generation in commercial cryptographic applications. Based on the wide application of the SM3 algorithm, how to increase the processing speed of the SM3 algorithm,the problems to be solved by the technical personnel in the field are solved urgently.
Disclosure of Invention
In view of this, embodiments of the present application provide a processing method, a processor, a chip, and an electronic device for an SM3 algorithm, so as to improve a processing speed of the SM3 algorithm.
In order to achieve the above object, the embodiments of the present application provide the following technical solutions.
In a first aspect, an embodiment of the present application provides a processing method for an SM3 algorithm, including:
reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand comprises a plurality of message words and message parameters, and a bit width of the first source operand is the same as a bit width of a round of computation results;
according to a single round calculation instruction, calculating a state word of a next state by using a state word of a previous state and a corresponding message word and a corresponding message parameter of a current round;
and storing the state word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of the round calculation result.
In a second aspect, an embodiment of the present application provides a processor, including: a round calculation unit, and an operand register; the operand register includes: a first source operand register and a first destination operand register; wherein the first source operand register sets a first source operand, the first source operand including a plurality of message words and message parameters, and a bit width of the first source operand being the same as a bit width of a round of computation result; the first destination operand register is used for setting a first destination operand, and the bit width of the first destination operand is the same as that of the round calculation result;
the processor is configured with a single round of computation instructions; the round calculation unit is used for reading a message word and a message parameter corresponding to the current round calculation from a first source operand of the first source operand register; according to a single round calculation instruction, calculating a state word of a next state by using a state word of a previous state and a corresponding message word and a corresponding message parameter of a current round; the state word for the next state is stored in a first destination operand of a first destination operand register.
In a third aspect, an embodiment of the present application provides a chip including the processor as described above.
In a fourth aspect, an embodiment of the present application provides an electronic device including the chip as described above.
According to the embodiment of the application, on the basis of setting the first source operand register and the first destination operand register with the same bit width as that of the round calculation result, a plurality of message words and a plurality of message parameters for the round calculation can be stored through the first source operand in the first source operand register, and the calculation result of each round calculation can be stored through the first destination operand in the first destination operand register, so that the round calculation of the SM3 algorithm is established on the source operand and the destination operand with the same bit width as that of the round calculation result. Furthermore, when performing the current round of calculation, the embodiment of the present application may read a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; according to a single round calculation instruction, calculating a state word of a next state by using a state word of a previous state and a corresponding message word and a corresponding message parameter of a current round; the state word for the next state is stored in the first destination operand of the first destination operand register. Therefore, under the condition that the source operand register and the destination operand register with the same bit width as the round calculation result are arranged, the bit width of the operand registers can be fully utilized on the vector granularity with the same bit width as the round calculation result, and each round of calculation is realized through a single round calculation instruction, so that the round calculation speed of the SM3 algorithm is increased, and the processing speed of the SM3 algorithm is increased.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a diagram of an example processor implementation of the SM3 algorithm for 128-bit operand registers.
Fig. 2 is a diagram illustrating an example of a processor implementation of the SM3 algorithm provided in an embodiment of the present application.
Fig. 3 is a flowchart of a processing method of the SM3 algorithm according to an embodiment of the present disclosure.
Fig. 4 is a flowchart of another processing method of the SM3 algorithm according to an embodiment of the present disclosure.
FIG. 5 is a diagram illustrating an exemplary hardware implementation of a round calculation according to an embodiment of the present disclosure.
Fig. 6A is a flowchart of still another processing method of the SM3 algorithm according to an embodiment of the present disclosure.
Fig. 6B is a flowchart of a method for calculating a status word by the SM3 algorithm according to an embodiment of the present application.
Fig. 7 is an exemplary diagram of a wheel calculation unit provided in an embodiment of the present application.
FIG. 8 is a diagram of another example of hardware for performing a round robin computation according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The SM3 algorithm may be less than 2 a long 64 Filling messages with bits into messages with the length of 512-bit multiple, grouping the messages into a plurality of 512-bit message blocks, sequentially and respectively performing iterative compression processing on each 512-bit message block, and outputting each message block pairA corresponding hash value of 256 bits. In the SM3 algorithm, the iterative compression process on a message block mainly involves message expansion and round robin computation.
In the process of message expansion, a message block can be divided into initial W 0 To W 15 Then by message expansion, the expansion generates W 0 To W 67 68 message words, and W 0 ' to W 63 ' of 64 message parameters. In the process of expanding and generating message parameters, a message word W can be used i And W i+4 Get the message parameter W i ', e.g. W i ’=W i ⊕W i+4 I ∈ 0 to 63; in the course of expanding to generate message word, message word W 16 To W 67 Can be in the initial 16 message words W 0 To W 15 Is generated by the obtained message word extension, for example:
W m ←P 1 (W m-16 ⊕W m-9 ⊕(W m-3 <<<15))⊕(W m-13 <<<7)⊕W m-6 and m ∈ 16 to 67.
In the course of round-robin computation, the SM3 algorithm may start with the initial 8 state words (A) 0 、B 0 、C 0 、D 0 、E 0 、F 0 、G 0 、H 0 ) Adding corresponding message words and message parameters in each round, and obtaining 8 state words output in the final round through multi-round iterative round calculation; the 8 state words of the final round output may form a 256-bit hash value. Wherein the message word and the message parameters corresponding to the current round are used in one round of computation.
For each round of computation, the processor needs to read the source operands for the round of computation (e.g., the message word and message parameters corresponding to the current round of computation) from the source operand registers and write the computed status word to the destination operand register. Based on the 128-bit operand registers of the processor, the round computation of SM3 is performed at a vector granularity of 128 bits, which results in an operand bit width that does not correspond to the 256-bit computation result of the round computation, so that the processing speed of SM3 is limited. That is, due to operand bit width limitations of the operand registers (source operand register and destination operand register), each round of computation of SM3 needs to be performed by multiple rounds of computation instructions, resulting in a limitation of the processing speed of SM 3.
For illustrative purposes, fig. 1 exemplarily shows a processor implementation example diagram of a 128-bit operand register based SM3 algorithm, and as shown in fig. 1, an instruction set of the processor includes an SM3 two round four state word update instruction 111 and an SM3 two round four remaining state word update instruction 112 for round calculation; and, an operand register of the processor includes: a source operand register 121 and a source operand register 122 that are 128 bits wide, and a destination operand register 131 and a destination operand register 132 that are 128 bits wide;
in each round of computation, 128-bit source operand register 121 may store 4 message words for 4 rounds and 128-bit source operand register 122 may store 4 message parameters for 4 rounds, based on the 32-bit message words and message parameters. Meanwhile, since the bit width of one destination operand register is 128 bits, and the calculation result of one round of calculation is 256 bits (i.e. a 256-bit hash value is formed based on 8 state words, and one state word is 32 bits), the processor needs to complete the calculation of the state word of the current round through two rounds of calculation instructions. For example, the SM3 two round four state word update instruction 111 calculates 4 state words of the current round based on the message words and message parameters stored in the source operand registers 121 and 122, and stores the 4 state words in the destination operand register 131; SM3 computes two rounds of four remaining state word update instructions 112, the remaining 4 state words for the current round based on the message words and message parameters stored in source operand registers 121 and 122, and stores in destination operand register 132.
It can be seen that under the bit width limitation of the 128-bit operand register, the round of computation of SM3 is performed at a vector granularity of 128 bits, which results in a need to split the round of computation into multiple rounds of computation instruction execution, resulting in a limited processing speed of the SM3 algorithm. Meanwhile, with the development of high-performance processors, vector components can provide more operands and parallel computation, and if the SM3 computation is only performed on vector granularity of 128 bits, the performance development of the processors cannot be adapted.
Based on this, the embodiment of the present application provides an improved processing scheme of the SM3 algorithm, and a single round of calculation instruction is proposed based on an operand register having the same bit width as the round calculation result (e.g. an operand register of 256 bits), so that round calculation is performed on a vector granularity having the same bit width as the round calculation result (e.g. a vector granularity of 256 bits), thereby increasing the processing speed of the SM3 algorithm. That is, on the basis that the bit width of the operand register corresponds to the bit width of the round calculation result, the bit width of the operand register is fully utilized, and further, each round of calculation is realized through a single round calculation instruction.
As an alternative implementation, taking an operand register with 256 bits as an example for a processor, fig. 2 is an exemplary diagram illustrating a processor implementation example of the SM3 algorithm provided in an embodiment of the present application. It should be noted that the bit width of the operand register may be the same as the bit width of the round calculation result, and based on that the current round calculation result is 256 bits, the embodiment of the present application takes the bit width of the operand register as an example for description. Referring to fig. 2, the processor may include a round calculation unit 210 for round calculation, and a message extension unit 220 for message extension; and the instruction set of the processor includes: a single round of computation instruction 211 for round of computation (e.g., the round of computation unit 210 executes the single round of computation instruction 211 to implement round of computation of the SM3 algorithm), and a single message expansion instruction 221 for message expansion (e.g., the message expansion unit 220 executes the message expansion instruction 221 to implement message expansion of the SM3 algorithm). Like the single-round calculation instruction 211 performing round calculation based on a 256-bit vector granularity, the message expansion instruction 221 provided by the embodiment of the present application may also perform message expansion based on a 256-bit vector granularity. In one example, the single-round computation instructions 211 may be referred to as VSM3RND256 and the message extension instructions 221 may be referred to as VSM3MSG 256.
As further shown in fig. 2, operand registers may be provided in the processor, and the operand registers provided may include: a first source operand register 231, a second source operand register 232, a third source operand register 233, a first destination operand register 241, and a second destination operand register 242; the bit width of each operand register (including each source operand register and each destination operand register) is the same as the bit width of the round calculation result, for example, the bit width of each operand register is 256 bits.
In the present embodiment, the first source operand register 231 and the first destination operand register 241 are used for round robin computation. As an alternative implementation, the first source operand register 231 may store a total of 256 bits of message words and message parameters; for example, the first source operand register 231 may hold 4 rounds of 4 message words (W) j ,W j+1 ,W j+2 ,W j+3 ) And 4 message parameters (W) for 4 rounds j ’,W j+1 ’,W j+2 ’,W j+3 ') where j represents the current number of rounds of the round calculation, and W j+3 Not more than W 67 ,W j+3 ' not more than W 63 ’。
In the current round of computation, the round computation unit 210 may execute the single round of computation instruction 211 to compute 8 status words for the current round based on the corresponding message word and message parameter for the current round of computation stored by the first source operand register 231 and store the 8 status words resulting from the current round of computation in the first destination operand register 241.
Second source operand register 232, third source operand register 233 and second destination operand register 242 are used for message extension. As an alternative implementation, the second source operand register 232 may store 8 message words with the largest obtained sequence number, and the third source operand register 233 may store 8 message words before the message word stored in the second source operand register 232; in message expansion, the message expansion unit 220 may execute the message expansion instruction 221, so as to expand the following 8 message words based on the message words stored in the second source operand register 232 and the third source operand register 233, and store the expanded 8 message words in the second destination operand register 242.
For example, in 16 message words W based on the initial division 0 To W 15 Second source operand register 232 may store W when message expansion is performed 8 To W 15 8 message words (one message word is 32 bits), the third source operand register 233 may store W 8 To W 15 The first 8 message words W 0 To W 7 The message extension unit 220 may execute the message extension instruction 221 to deposit 16 message words W based on the second source operand register 232 and the third source operand register 233 0 To W 15 Expanded to obtain the next 8 message words W 16 To W 23 (ii) a 8 message words W obtained by expansion 16 To W 23 May be stored in second destination operand register 242.
Based on the single-round calculation instruction and the message expansion instruction provided by the embodiment of the present application, a processing method of the SM3 algorithm provided by the embodiment of the present application is described below. As an alternative implementation, fig. 3 exemplarily illustrates a flowchart of a processing method of the SM3 algorithm provided in an embodiment of the present application, and the method flow may be implemented by a processor executing a single-round calculation instruction (for example, a round calculation unit in the processor executes the single-round calculation instruction, and implements the method flow). Referring to fig. 3, the method flow may include the following steps.
In step S31, reading a message word and a message parameter corresponding to the current round of calculation from the first source operand of the first source operand register; wherein the first source operand comprises a plurality of message words and message parameters, and a bit width of the first source operand is the same as a bit width of a round of computation results.
In the embodiment of the application, under the condition that the first source operand register with the same bit width as the round calculation result is set, the first source operand can be set through the first source operand register, and the bit width of the first source operand is the same as the bit width of the round calculation result. For example, in the case of setting a 256-bit wide first source operand register, a 256-bit wide first source operand may be set by the first source operand register. The first source operand may be considered to be a source operand for a round robin computationThe first source operand may include a plurality of message words and a plurality of message parameters having a total bit width that is the same as a bit width of the round calculation result, based on the message words and the message parameters used in the round calculation process. For example, if the current round of computation is the jth round of computation, then the first source operand may include 4 message words (W) for 4 rounds j ,W j+1 ,W j+2 ,W j+3 ) And 4 message parameters (W) for 4 rounds j ’,W j+1 ’,W j+2 ’,W j+3 ’)。
Based on the first source operand stored in the first source operand register, when performing the current round of calculation, the embodiment of the application may read a message word and a message parameter corresponding to the current round of calculation from the first source operand. As an alternative implementation, based on the current round number of the round calculation, the embodiment of the present application may read the message word and the message parameter, where the sequence number corresponds to the current round number, from the first source operand. In one example, the round calculation may be performed for 64 rounds, and then in the jth round calculation, the embodiments of the present application may read the message word W from the first source operand j And a message parameter W j ’。
In step S32, the status word of the next state is calculated according to a single round calculation instruction, using the status word of the previous state and the corresponding message word and message parameter of the current round calculation.
After reading the message word and the message parameter corresponding to the current round of calculation from the first source operand stored in the first source operand register, the embodiment of the present application may calculate the state word of the next state according to the single round of calculation instruction, where the state word of the next state may be regarded as the calculation result of the current round of calculation. Optionally, when the single-round calculation instruction calculates the state word of the next state according to the state word of the previous state, the message word and the message parameter corresponding to the current round calculation may be added, so as to obtain the calculation result of the current round.
In one example, assuming that the j-th round of computation is currently being performed, the state word (A) based on the last state that has been obtained is used i ,B i ,C i ,D i ,E i ,F i ,G i ,H i ) And calculating the corresponding message word W by the current round j And a message parameter W j ' the embodiment of the present application can execute the single-round calculation command to calculate the status word (A) of the next status i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ). For example, in the first round of computation, the state word of the previous state is the initial state word (A) 0 ,B 0 ,C 0 ,D 0 ,E 0 ,F 0 ,G 0 ,H 0 ) The state word of the next state calculated is (A) 1 ,B 1 ,C 1 ,D 1 ,E 1 ,F 1 ,G 1 ,H 1 ) And so on.
In some embodiments, in the process of executing the single-round calculation instruction to calculate the state word of the next state, the present application embodiment may calculate the first partial state word of the next state according to the state word of the previous state; calculating corresponding message words and message parameters based on the state words of the previous state and the current round, and calculating intermediate variables; determining a second partial state word of a next state based on the intermediate variable; wherein the state word may be divided into a first partial state word and a second partial state word, the first partial state word and the second partial state word of the calculated next state may form the state word of the next state.
As an alternative implementation, in the embodiment of the present application, it may be configured that (B, C, D, F, G, H) of the 8 state words of (a, B, C, D, E, F, G, H) is a first partial state word, and (a, E) is a second partial state word, so that a next state of the first partial state word (B, C, D, F, G, H) may be determined by a state word of a previous state; and for the next state of the second partial state word (a, E), the embodiments of the present application may calculate the corresponding message word and message parameters based on the state word of the previous state and the current round, determine intermediate variables, and then determine the next state of the second partial state word based on the calculated intermediate variables.
In one example, assume a wheelThe current round number of the calculation is j (j can exist as the immediate number imm8), and the calculation result of the round calculation is (A) i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) (ii) a Based on the last state word (a) obtained i ,B i ,C i ,D i ,E i ,F i ,G i ,H i ) And calculating the corresponding message word W by the current round j And a message parameter W j ' the embodiment of the present application can calculate the first partial state word (B) of the next state by executing the single round calculation instruction i+1 ,C i+1 ,D i+1 ,F i+1 ,G i+1 ,H i+1 ) And intermediate variables SS1, SS2, TT1, TT 2; further, the second partial state word (A) of the next state is calculated by intermediate variables TT1, TT2 i+1 ,E i+1 )。
As an alternative implementation, the calculation process of the single-round calculation instruction may be represented by the following formula:
SS1←((A i <<<12)+E i +(T j <<<j))<<<7;
Figure BDA0003632295780000091
TT1←FF j (A i ,B i ,C i )+D i +SS2+W j ’;
TT2←GG j (E i ,F i ,G i )+H i +SS1+W j
D i+1 ←C i
C i+1 ←B i <<<9;
B i+1 ←A i
A i+1 ←TT1;
H i+1 ←G i
G i+1 ←F i <<<19;
F i+1 ←E i
E i+1 ←P 0 (TT2)。
wherein the content of the first and second substances,
Figure BDA0003632295780000092
represents a 32-bit exclusive-or operation;<<<representing a circular left shift operation; ← representing left assignment operator; t is j Taking different values as the algorithm constant along with the change of j; p 0 Representing a permutation function in a round calculation, P 0 (X) can be represented as:
Figure BDA0003632295780000093
FF j and GG j Expressing Boolean functions, and taking different expressions along with the change of j; specifically, the method comprises the following steps:
Figure BDA0003632295780000094
Figure BDA0003632295780000095
Figure BDA0003632295780000096
further, the Λ represents a 32-bit or operation, the v represents a 32-bit or operation,
Figure BDA0003632295780000097
representing a32 bit not operation.
In step S33, the status word of the next state is stored in the first destination operand of the first destination operand register, where the bit width of the first destination operand is the same as the bit width of the round calculation result.
After the state word of the next state is obtained through calculation, the embodiment of the present application may store the state word of the next state in a first destination operand, where the first destination operand is set inAnd the bit width of the first destination operand is the same as the bit width of the round calculation result. For example, the embodiment of the present application may set a first destination operand register with a bit width of 256 bits, which may store a first destination operand with 256 bits, so that the embodiment of the present application may calculate a 256-bit state word (a) of the next state i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) Then, the embodiment of the present application may store the status word of the next state in the first destination operand of the first destination operand register, thereby fully utilizing the bit width of the first destination operand register.
In some embodiments, the round calculation unit in the processor may execute the single round calculation instruction provided in the embodiments of the present application to implement the method flow illustrated in fig. 3 described above.
According to the embodiment of the application, on the basis of setting the first source operand register and the first destination operand register with the same bit width as that of the round calculation result, a plurality of message words and a plurality of message parameters for the round calculation can be stored through the first source operand in the first source operand register, and the calculation result of each round calculation can be stored through the first destination operand in the first destination operand register, so that the round calculation of the SM3 algorithm is established on the source operand and the destination operand with the same bit width as that of the round calculation result. Furthermore, when performing the current round of calculation, the embodiment of the present application may read a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; according to a single round calculation instruction, calculating a state word of a next state by using a state word of a previous state and a corresponding message word and a corresponding message parameter of a current round; the state word for the next state is stored in the first destination operand of the first destination operand register. Therefore, under the condition that the source operand register and the destination operand register with the same bit width as the round calculation result are arranged, the bit width of the operand registers can be fully utilized on the vector granularity with the same bit width as the round calculation result, and each round of calculation is realized through a single round calculation instruction, so that the round calculation speed of the SM3 algorithm is increased, and the processing speed of the SM3 algorithm is increased.
In one implementation example, let the source operand used for the round calculation be ymm1, the destination operand be ymm0, and the current round number j of the round calculation is stored in the immediate number imm 8; based on the round calculation result being 256 bits, the bitwidths of ymm1 and ymm0 can be both 256 bits, and are respectively set in a source operand register with 256 bit width and a destination operand register with 256 bit width; based on the single-round computation instruction VSM3RND256 provided by the embodiment of the present application, on the basis of the 256-bit source operand register and the 256-bit destination operand register, the embodiment of the present application may perform a single-round computation operation on a 256-bit vector granularity, and a round of computation may be completed by 2 beats. For example, in performing the j-th round of computation, ymm1 stores 4 message words (W) for 4 rounds j ,W j+1 ,W j+2 ,W j+3 ) And 4 rounds of 4 message parameters (W) j ’,W j+1 ’,W j+2 ’,W j+3 ') 256 bits in total; the single-round calculation instruction VSM3RND256 provided in the embodiment of the present application can calculate the j-th round calculation result (a) in 2 beats according to the above corresponding formula i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) And stored in ymm 0.
In further embodiments, in the embodiment of the present application, when performing message expansion, a source operand register and a destination operand register having a bit width that is the same as a bit width of a round calculation result may be used, so that a round of message word expansion is completed through a single message expansion instruction. For example, the embodiment of the present application can complete a round of 8 message words expansion in 1 beat by a single message expansion instruction. As an alternative implementation, fig. 4 illustrates another processing method flow diagram of the SM3 algorithm provided by the embodiment of the present application, and the method flow may be implemented by a processor executing a single message extension instruction (for example, by a message extension unit in the processor executing the message extension instruction to implement the method flow). Referring to fig. 4, the method flow may include the following steps.
Reading a message word for message extension from a second source operand of the second source operand register and a third source operand of the third source operand register in step S41; the bit width of the second source operand and the bit width of the third source operand are the same as the bit width of the round calculation result, the second source operand comprises the first number of message words with the largest obtained sequence number, and the third source operand comprises the first number of message words before the second source operand.
In the embodiment of the application, when the second source operand register and the third source operand register with the same bit width as the round calculation result are set, the second source operand can be set through the second source operand register, the third source operand can be set through the third source operand register, and the bit width of the second source operand and the bit width of the third source operand are the same as the round calculation result. For example, in the case of setting the second and third source operator counters of 256-bit width, the second and third source operands of 256-bit may be set by the second and third source operator counters, respectively.
The second source operand and the third source operand may be considered source operands for message extensions. In some embodiments, when performing message word expansion, the embodiment of the present application may expand a first number of message words having the same bit width as the round calculation result by a single message expansion instruction in one beat (one beat may be regarded as one clock cycle of the processor). As an alternative implementation, the second source operand may include the first number of message words with the largest obtained sequence number, and the third source operand may include the first number of message words before the second source operand, so that the embodiment of the present application may expand the first number of message words after the second source operand in one beat by using the second source operand and the third source operand through a single message expansion instruction.
In one example, if the result of the round-robin computation is 256 bits and a message word is 32 bits, then the second source operand may include the largest resulting sequence numberThe third source operand may include 8 message words before the second source operand, and the embodiment of the present application may expand 8 message words after the second source operand in one beat by using the second source operand and the third source operand through a single message expansion instruction. E.g. W based on initial partitioning of message blocks 0 To W 15 The second source operand can store 8 message words W with the largest sequence number 8 To W 15 The third source operand may hold 8 message words W preceding the second source operand 0 To W 7 . Based on W 0 To W 15 The embodiment of the present application may expand 8 message words W after one beat expansion through a single message expansion instruction 16 To W 23 . Similarly, the next 8 message words W are expanded 24 To W 31 The message words stored in the second source operand and the third source operand may be adjusted accordingly, for example, the second source operand stores the largest 8 message words W 16 To W 23 Third source operand deposit the first 8 message words W 8 To W 15 For expanding the next 8 message words, and so on.
Based on the message words stored in the second source operand and the third source operand, when the message word is expanded, the message word for message expansion can be read from the second source operand and the third source operand, so that the next message word can be expanded based on the read message word. As an alternative implementation, when a message word is expanded, the embodiment of the present application may read a plurality of message words with specified sequence numbers before the message word to be expanded currently from the second source operand and the third source operand, such as reading a 16 th message word, a 13 th message word, a 9 th message word, a 6 th message word, and a3 rd message word before the message word to be expanded is read. In one example, assume that a message word W is currently to be extended m W may be read from the second and third source operands m-16 、W m-13 、W m-9 、W m-6 And W m-3 For a total of 5 message words.
In step S42, a first number of message words following the second source operand are generated by expansion using the read message word according to a single message expansion instruction.
After reading the message words for message expansion from the second source operand and the third source operand, the embodiment of the present application may expand the first number of message words after the second source operand according to a single message expansion instruction, so that the expansion of the first number of message words after the second source operand is completed by one beat through the message expansion instruction, thereby completing the expansion of a plurality of message words with the same bit width as the round calculation result by one beat.
In one example, based on a message word to be currently expanded, the embodiment of the present application may expand a message word by using a permutation function and a 32-bit exclusive or operation in message expansion according to a plurality of read message words after reading the plurality of message words with specified sequence numbers before the message word to be currently expanded from a second source operand and a third source operand. For example, assume that the message word W currently needs to be extended m W may be read from the second and third source operands m-16 、W m-13 、W m-9 、W m-6 And W m-3 Then, the embodiment of the present application may expand the message word W by the following formula m
Figure BDA0003632295780000131
Wherein, P 1 Representing permutation functions in message extensions, message words, P, for message words X, Y, Z 1 (X) can be represented as:
Figure BDA0003632295780000132
in step S43, the message word generated by the expansion is stored in the second destination operand of the second destination operand register; wherein the bit width of the second destination operand is the same as the bit width of the round calculation result.
After the first number of message words after the second source operand is generated by the expansion, the total bit width of the first number of message words generated by the expansion is the same as the round calculation result, and the message words generated by the expansion can be stored in the second destination operand of the second destination operand register according to the embodiment of the application.
In some embodiments, a message extension unit in the processor may execute the message extension instruction provided in the embodiments of the present application to implement the method flow illustrated in fig. 4 described above.
According to the embodiment of the application, on the basis of setting the second source operand register, the third source operand register and the second destination operand register, which have the same bit width as the bit width of the round calculation result, the obtained message words with the largest sequence number can be stored through the second source operand in the second source operand register, the message words with the first number before the second source operand can be stored through the third source operand in the third source operand register, and the message words with the first number after the second source operand generated by expansion can be stored through the second destination operand in the second destination operand register, so that the message expansion of the SM3 algorithm is established on the source operand and the destination operand, which have the same bit width as the round calculation result. Furthermore, within a beat of message extensions, embodiments of the present application may read a message word for message extension from a second source operand of a second source operand register and a third source operand of a third source operand register; according to a single message expansion instruction, completing the expansion of the first number of message words in one beat by using the read message words; the expanded generated message word may be deposited in a second destination operand register. According to the method and the device, on the basis that the bit width of the operand register is the same as the bit width of the round calculation result, the bit width of the operand register is fully utilized, and a plurality of message words which are the same as the bit width of the round calculation result are generated by expanding in one beat through a single message expansion instruction, so that the message expansion speed of the SM3 algorithm is increased, and the processing speed of the SM3 algorithm is further increased.
In one example, if the source operands for message expansion are ymm1 and ymm2, respectively, and the destination operand is ymm0, the result is 256 bits based on round robin calculationYmm1, ymm2, and ymm0 may each be 256 bits wide, and ymm1 and ymm2 may be respectively disposed in a source operand register that is 256 bits wide and ymm0 may be disposed in a destination operand register that is 256 bits wide. Based on the message expansion instruction VSM3MSG256 provided in the embodiment of the present application, the embodiment of the present application can complete the expansion of 8 message words within 1 beat on a 256-bit vector granularity. For example, message word W based on initial partitioning 0 To W 15 Ymm1 stores message word W 8 To W 15 Ymm2 stores message word W 0 To W 7 (ii) a The VSM3MSG256 provided in the embodiment of the present application can generate the message word W by expanding in one beat according to the above corresponding formula 16 To W 23 And stored in ymm 0; the extension of the subsequent message words is effected analogously.
In further embodiments, based on the round calculation process provided by the embodiments of the present application, fig. 5 is a diagram schematically illustrating an alternative hardware example for performing round calculation according to the embodiments of the present application, and the hardware structure shown in fig. 5 can complete a round calculation within 2 beats. As shown in FIG. 5, in the jth round of calculation, the embodiment of the present application can be based on 8 state words (A) of the previous state i ,B i ,C i ,D i ,E i ,F i ,G i ,H i ) And a message word W j And a message parameter W j ', 8 state words (A) of the next state are calculated i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) The specific process is as follows:
in the first stage flowing water FX1, A i 、C i 、E i And G i Latching a beat in the second stage pipeline FX2, based on the left-handed assignment operator (←), respectively, to obtain B in the second stage pipeline i+1 、D i+1 、F i+1 And H i+1 (ii) a At the same time, B i Execute<<<Operation of 9, producing C in the second stage of the pipeline i+1 ;F i Execute<<<19, in the second stage of the pipeline, generating G i+1
In the first stage flowing water FX1, based onCurrent number of rounds j, execution Tj<<<j, the operation result is inputted to the first 32-bit CSA unit (shown in FIG. 5 as CSA32_1), and CSA32_1 is simultaneously inputted to A i <<<12 and E i (ii) a CSA32_1 can process and compress the input data into 2-entry output, and the 2-entry output of CSA32_1 enters the first 32-bit Adder (such as the Adder32_1 shown in FIG. 5); addition result execution of Adder32_1<<<7, obtaining an intermediate variable SS 1; it should be noted that CSA (carry Save adder) is a carry Save adder, which may be 32 bits wide, and if the input of the CSA unit is a, b and c, and the output is sum and car, it performs the following calculation: sum ═ a ≧ b &, c, and car ═ a &&b|a&c|b&c;
Further, in the first stage flowing water FX1, SS1 and A i <<<Performing exclusive-or (XOR) on the operation result of 12 to obtain an intermediate variable SS 2; intermediate variables SS1 and SS2 may be placed in registers of the second stage pipeline FX2, waiting for the next beat to execute;
at the same time, the first stage flowing water FX1, j, E i 、F i And G i Execution of GG j Calculation, GG j Operation result, H i And W j Inputting a second 32-bit CSA unit (shown as CSA32_2 in FIG. 5); the 2 entries output by CSA32_2 are stored in FX2 register, waiting for the next beat to execute; and j, A i 、B i And C i Executing FF j Operation, FF j Operation result, D i And W j ' inputting a third 32-bit CSA unit (shown in FIG. 5 as CSA32_3), 2 outputs of CSA32_3 are stored in FX2 register, and waiting for the next beat to execute;
in the next beat, 2 entries of SS1, CSA32_2 in the FX2 register are output into the fourth 32-bit CSA unit (shown in FIG. 5 as CSA32_ 4); the 2-bit output of CSA32_4 is input to a second 32-bit Adder (such as the Adder32_2 shown in FIG. 5) to generate TT2, TT2 is passed through P 0 Hardware computation of a function, resulting in E i+1
The 2 entries of SS2 and CSA32_3 in the FX2 register are output to a fifth 32-bit CSA unit (shown in FIG. 5 as CSA32_5), and the 2 entries of CSA32_5 are output to a third 32-bit additionLegal instrument (Adder 32_3 shown in FIG. 5), thereby generating TT 1; TT1 is computed based on the left assignment operator (←), to obtain A i+1
The round calculation process shown in fig. 5 can complete one round calculation in two beats through a single round calculation instruction; that is, in two consecutive beats, the status word used in the first stage pipeline input round of computation is input by one beat, and the computation result of one round of computation is generated in the second stage pipeline by another beat, and the pipeline relationship thereof is partially shown in table 1 below:
Figure BDA0003632295780000161
TABLE 1
In the round calculation process shown in fig. 5, the calculation result of each round of calculation is 8 status words, and the calculation result of the previous round of calculation is used as the input of the next round of calculation, so as to perform round calculation iteratively, thereby obtaining the hash value of the round calculation in the final round; if 64 rounds of iterative round calculations are performed, the round calculation unit needs to complete through 128 beats. It can be seen that in the multi-round calculation process of the SM3 algorithm, there is data dependency in the front and rear round calculation, which results in that the round calculation can be executed serially only iteratively, and therefore the throughput of the round calculation is to be improved.
Based on this, in some further embodiments, the present application provides a round computation implementation supporting internal bypass and provides a corresponding micro-structural design, so that in a round computation process, the state words obtained by the second-stage pipelining FX2 are returned to the first-stage pipelining FX1 for selecting the state words used by subsequent rounds of computation, thereby eliminating data correlation between the rounds of computation of the previous and next rounds, so that successive rounds of computation of the SM3 algorithm can be executed in a pipelined manner; and then, under the condition of adding a little time delay and hardware cost, the effect of completing one round of calculation by one beat after the first round of calculation is achieved, and the throughput rate of the round of calculation is improved.
Based on the above-mentioned idea, as an alternative implementation, fig. 6A exemplarily shows a flowchart of still another processing method of the SM3 algorithm provided in the embodiment of the present application, and the method flow may be implemented by a processor, for example, by a round calculation unit in the processor, and referring to fig. 6A, the method flow may include the following steps.
In step S610, the status word of the next status generated by the current round of computation is returned from the second stage of flowing water to the first stage of flowing water.
In step S611, if the next round of calculation of the current round is performed, the returned state word of the next state is selected as the state word used in the next round of calculation in the first-stage flowing water; and if the current round of calculation is carried out, selecting the state word of the last state in the first-stage flowing water as the state word used by the current round of calculation.
In step S612, according to the single round calculation instruction, calculating in the current round, and based on the selected status word, the corresponding message word, and the message parameter, obtaining a status word generated by the current round; and according to the single-round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated by the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
As an alternative implementation, in the first beat of the first round of calculation, the state word of the initial state may be selected as the state word used in the first round of calculation, so as to calculate the state word of the next state. After the state word of the next state is calculated in the second stage of the flow, the state word of the next state can be returned to the first stage of the flow through a bypass mechanism, so that the state word used in the subsequent calculation is selected by combining the state word of the previous state existing in the first stage of the flow. In a possible implementation, assume that the current round of computation is the jth round of computation and the state word of the last state is A i To H i The state word of the next state generated by the jth round of calculation is A i+1 To H i+1 Then when performing the j-th round of calculation, because of A i+1 To H i+1 Not yet produced, alternative A is available in the examples of this application i To H i As the status word used for the jth round of calculation; the state word A generated when the j round of calculation i+1 To H i+1 By sideThe way returns to the first-stage flowing water, and based on the enabling of the bypass, the embodiment of the application can select A from the first-stage flowing water in the j +1 th round i+1 To H i+1 As the state word used for the j +1 th round of calculation; further, the state word A generated by the j +1 th round calculation i+2 To H i+2 Can be returned to the first stage flowing water through a bypass and is identical to the state word A of the last state at the moment in the first stage flowing water i+1 To H i+1 The selection of the status word is performed together, and so on. Therefore, the state words generated by each round of calculation are returned to the first-level flowing water through side dropping, the process of re-inputting the state words in each round of calculation is omitted, and the number of beats occupied by inputting the state words in the round of calculation can be greatly saved.
As an optional implementation, in the first beat of the first round of computation, in the embodiment of the present application, an initial state word is input in the first-stage pipeline, and a state word used in the subsequent round of computation is implemented by returning a state word generated in the second-stage pipeline to the first-stage pipeline; therefore, in the embodiment of the present application, except for inputting the initial state word in the first-stage pipeline in the first beat of the first round of computation, in the subsequent two consecutive beats, the embodiment of the present application may obtain the state word in one round of computation in the second-stage pipeline and return to the first-stage pipeline by one beat, and based on the state word returned to the first-stage pipeline, the embodiment of the present application may obtain the state word in the next round of computation in the second-stage pipeline by another beat; that is to say, the state word obtained by one round of calculation can be generated in the second-stage flow and returned to the first-stage flow through one beat, and based on the state word returned to the first-stage flow, the embodiment of the present application can directly generate the state word obtained by the next round of calculation in the second-stage flow at the next beat, thereby avoiding the process of inputting the state word used by the next round of calculation at the next beat. For ease of understanding, reference may be made to the following example in table 2, which illustrates some examples of pipeline relationships for embodiments of the present application.
Figure BDA0003632295780000181
TABLE 2
Under the bypass (bypass) mechanism of the embodiment of the present application, except for the first round of calculation, the state words used by other round of calculation can be returned to the first-stage pipeline through the state words generated by the previous round of calculation, and the calculation result of the next round of calculation is directly generated in the second-stage pipeline of the next beat, so except that the first round of calculation needs to be completed by 2 beats, other round of calculation in the embodiment of the present application can be completed within one beat based on the bypass mechanism, so that 64 rounds of calculation are performed, and the round calculation unit can be completed by 65(1 × 2+63) beats. Compared with the round calculation process illustrated in fig. 5, the embodiment of the present application can greatly increase the processing speed of the round calculation by the mechanism that the calculation result of the round calculation is returned from the second-stage running water to the first-stage running water from the bypass, thereby further greatly increasing the processing speed of the SM3 algorithm.
Under the bypass mechanism, the embodiment of the application can further improve the calculation logic of the wheel calculation process, so that the method is suitable for a wheel calculation mode under the bypass mechanism. In the case that the status word is divided into a first partial status word (e.g., B, C, D, F, G, H) and a second partial status word (e.g., a, E), the embodiment of the present application generates the status word (e.g., a) of the next status in the current round of calculation i+1 To H i+1 ) After returning from the second stage to the first stage, the status word A of the next state i+1 To H i+1 The state word A which can be compared with the previous state i To H i Selection is performed.
Taking the current j round of calculation as an example, the embodiment of the present application can select the state word a of the previous state in the first-level pipeline i To H i Performing calculation processing; as an alternative implementation, the embodiment of the present application may apply to the selected a based on the left direction assignment operator (←) i 、C i 、E i And G i Respectively treated to obtain B in the second stage flowing water i+1 、D i+1 、F i+1 And H i+1 (ii) a At the same time, in the first stage, water flows, pair B i Performing a circular left shift operation (e.g., performing B) i <<<9 ofOperation) to get C in the second stage pipeline i+1 (ii) a At the same time, in the first stage, the water flows, pair F i Performing a circular left shift operation (e.g., performing F) i <<<19) to get G in the second stage of the pipeline i+1
In further embodiments, embodiments of the present application may bypass intermediate status words As and Ac obtained in the second stage of the flow stream back to the first stage of the flow stream; the intermediate state words As and Ac are used to directly produce a round of computation of the state word a; thus, the result of the cyclic left shift operation of As and Ac may be compared to the state word A i Selecting the cyclic left shift operation result and 0; as an optional implementation, if the calculation of the next round of the j-th round is performed, in the first-stage flowing water, the cyclic left shift operation result of the intermediate state characters As and Ac is selected to be used for the calculation of the next round; if the current wheel calculation is carried out, selecting a state character A in the first-stage flowing water i The sum of the cyclic left shift operation results and 0 is used for calculating the current wheel;
second partial state word (A) for next state i+1 ,E i+1 ) According to the calculation of (A), the embodiment of the application can select according to the first-stage flowing water i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', in the second stage the pipeline generates a status word A i+1 ,E i+1 . As an alternative implementation, fig. 6B exemplarily shows an alternative method flow diagram for calculating the status word by the SM3 algorithm according to the embodiment of the present application, and as shown in fig. 6B, the method flow may include the following steps.
In step S621, in the first stage pipeline, according to the selected status word A i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', intermediate data for calculating intermediate variables TT1 and TT2 are determined.
As an alternative implementation, for the j-th round of calculation, in the embodiment of the present application, in the first-level pipeline, the selected state word a may be used i 、B i And C i To perform FF j Calculating; according to FF j The result of the operation of (2), the selected status word D i And a message parameter W j ' performing a first CSA operation, the result of the first CSA operation and the selected status word A i The operation result of the left shift of the cycle is stored in a register of the second-stage flow;
meanwhile, in the first stage, water flows according to the selected A i The result of the cyclic left shift operation, 0, and T j Performing second CSA operation on the result of the cyclic left shift operation, performing first addition operation on the result of the second CSA operation, and performing cyclic left shift operation on the result of the first addition operation to obtain an intermediate variable S1;
at the same time, in the first stage of pipelining, the status word E to be selected i Performing cyclic left shift operation to obtain an intermediate variable S2; for the selected status word G i 、F i And E i Carrying out GG j Operation according to GG j Operation result, selected status word H i And a message word W j Performing a third CSA operation; thus, a fourth CSA operation is performed on the third CSA operation result, the intermediate variables S1 and S2, and the fourth CSA operation result is stored in the register of the second-stage pipeline; in the first stage of the pipeline, the intermediate variables S1 and S2 are subjected to second addition operation to obtain an intermediate variable SS 1; intermediate variables SS1 are held in registers in the second stage pipeline.
In step S622, in the second stage pipeline, according to the intermediate data, an intermediate variable TT1 is calculated, and a state word A is determined according to the intermediate variable TT1 i+1 (ii) a And, in the second stage of the pipeline, calculating an intermediate variable TT2 from said intermediate data and determining a status word E from the intermediate variable TT2 i+1
As an optional implementation, in the second-stage flowing water, the embodiment of the present application may perform a third addition operation on the fourth CSA operation result to obtain an intermediate variable TT 2; thereby passing the intermediate variable TT2 through P 0 Function to obtain a status word E i+1
Meanwhile, the second stage of running water, the embodiment of the application can be used for converting the intermediate variable SS1State word A i Performing exclusive-or operation on the cyclic left shift operation result to obtain an intermediate variable SS 2; and performing a fifth CSA operation according to the intermediate variable SS2 and the first CSA operation result to obtain intermediate state words As and Ac. On the one hand, the intermediate status words As and Ac return to the first stage of pipelining; on the other hand, the intermediate state words As and Ac are subjected to a fourth addition operation to obtain an intermediate variable TT1, and the intermediate variable TT1 is processed based on the left assignment operator to obtain a state word A i+1
Based on the method flow principle of the bypass mechanism provided by the embodiment of the present application, the embodiment of the present application further provides an optional hardware example for performing round computation. On the basis of the processor implementation illustrated in fig. 2, fig. 7 illustrates an alternative exemplary diagram of a wheel calculating unit provided in an embodiment of the present application, and as illustrated in fig. 7, in an embodiment of the present application, the wheel calculating unit may include: a bypass unit 710, a plurality of sets of selectors 720, and computation logic 730;
the bypass unit 710 is configured to return the state word of the next state generated by the current round of calculation from the second-stage flowing water to the first-stage flowing water;
a multi-group selector 720, configured to select the returned status word in the next state in the first-stage flowing water as the status word used in the next round of calculation if the next round of calculation is performed; if the current round of wheel calculation is carried out, selecting the state word of the last state in the first-stage flowing water as the state word used by the current round of wheel calculation;
the calculation logic 730 is configured to perform calculation in the current round according to a single-round calculation instruction, and obtain a state word generated by the current round calculation based on the state word selected by the multiple sets of selectors, the corresponding message word and the message parameter; and according to the single-round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated by the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
In further embodiments, the bypass unit 710 may be further configured to return the intermediate state words As and Ac obtained in the second stage of the flow stream to the first stage of the flow stream; the intermediate state words As and Ac are used to directly produce a round of computation of the state word a;
the multiple groups of selectors 720 may also be configured to, if a next round of calculation of the jth round is performed, select a cyclic left shift operation result of the intermediate state words As and Ac in the first-stage flowing water for the next round of calculation; if the current wheel calculation is carried out, selecting a state character A in the first-stage flowing water i And the sum of the left shift operation result and 0 is used for the current wheel calculation.
As an alternative implementation, fig. 8 exemplarily shows another alternative hardware example for performing round calculation provided by the embodiment of the present application, and besides the first round calculation, the hardware structure shown in fig. 8 can complete a round calculation within 1 beat. As shown in FIG. 8, taking the current round of calculation as the jth round of calculation as an example, the jth round of calculation generates the state word A of the next state i+1 To H i+1 Correspondingly, the state word of the last state is A i To H i (ii) a And the plurality of sets of selectors in the wheel calculation unit may include: a first set of selectors 810, a second set of selectors 820, a third set of selectors 830;
first group selector 810 may comprise a plurality of selectors (Mux) at status word H i+1 、G i+1 、F i+1 And E i+1 After returning to the first stage of running water through the bypass unit, the plurality of selectors can select the returned status words H in the first stage of running water when the next round of calculation of the jth round is carried out i+1 、G i+1 、F i+1 And E i+1 As the status word used in the next round of calculation; selecting a state word H in the first-stage flowing water when the current wheel calculation is carried out i 、G i 、F i And E i As a status word used for current round of calculation;
the second set of selectors 820 may include a plurality of selectors in the state word A i+1 、B i+1 、C i+1 And D i+1 After returning to the first stage of running water through the bypass unit, the plurality of selectors can select the returned status word A in the first stage of running water when the next round of calculation of the jth round is carried out i+1 、B i+1 、C i+1 And D i+1 As the status word for the next round of calculation; selecting a state word A in the first-stage flowing water when the current wheel calculation is carried out i 、B i 、C i And D i As a status word used for current round of calculation;
the third set of selectors 830 may comprise a plurality of selectors for obtaining the status word A i+1 After the intermediate state words As and Ac are bypassed and returned to the first stage of the pipeline, the plurality of selectors may select the result of the circular left shift operation of the returned As and Ac (e.g., As shown in fig. 8) in the first stage of the pipeline during the next round of the j-th round of the computation<<<12、Ac<<<12) For the next round of calculation; selecting a state word A in the first-stage flowing water when the current wheel calculation is carried out i The result of the circular left shift operation (e.g., A shown in FIG. 8) i <<<12) And 0 for the current wheel calculation.
In the implementation process of the calculation logic 730 obtaining the state word generated by each round of calculation based on the selected state word, the calculation logic 730 calculates the first part of the state word, which may be the same as the manner described in the corresponding part; for example, in the jth round of calculation, the state word A selected by the first stage pipeline i 、C i 、E i And G i Processing based on the left-valued operator (←), respectively, to produce a status word B in the second stage of the pipeline i+1 、D i+1 、F i+1 And H i+1 (ii) a At the same time, in the first stage of pipelining, for the selected state word B i Performing a circular left shift operation to produce a status word C in the second stage pipeline i+1 (ii) a At the same time, in the first stage of pipelining, for the selected status word F i Performing a circular left shift operation to produce a status word G in the second stage pipeline i+1
Aiming at the calculation of the state words A and E, in order to reduce the time delay of the critical path, the embodiment of the application can adjust the calculation logic of the two-stage pipelining; for example, the compute logic instance may be pipelined at a first stage, depending on the state word A selected i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', determining intermediate data for calculating intermediate variables TT1 and TT 2; in the second stage of the pipeline, according to the intermediate data, an intermediate variable TT1 is calculated, and a state word A is determined according to the intermediate variable TT1 i+1 (ii) a And, in the second stage of the pipeline, calculating an intermediate variable TT2 from said intermediate data and determining a status word E from the intermediate variable TT2 i+1
In a more specific alternative implementation of calculating the status words A and E, and as shown in FIG. 8, taking the jth round of calculation as an example, in the first stage pipeline, the calculation logic may select the status word A selected by the second set of selectors 820 i 、B i And C i To perform FF j Calculating; FF j The operation result of (1), the status word D selected by the second group selector 820 i And a message parameter W j ' the first 3-input 2-output CSA unit (for example, CSA32_1 shown in fig. 8) is input and the first CSA operation is performed; at the same time, for the state word A selected by the second group selector 820 i A circular left shift operation is performed (e.g. a circular left shift operation of value 12 as shown in figure 8,<<<12) (ii) a First CSA operation result of CSA32_1 and state word A i The operation result of the circular left shift is stored in a register of a second-stage flow;
also, in the first stage pipeline, the computational logic may select A from the third set of selectors 830 i The result of the cyclic left shift operation, 0, and T j The result of the cyclic left shift operation (e.g., T shown in FIG. 8) j <<<j) A second 3-input-2-output CSA unit (for example, CSA32_2 shown in fig. 8) is input to perform a second CSA operation; the result of the second CSA operation of CSA32_2 is input to a 32-bit first adding unit (e.g., an Adder32_1 shown in fig. 8) and subjected to a first addition operation; further, the result of the first addition by the adapter 32_1 is circularly left shifted (e.g. a circularly left shifted value of 7 as shown in figure 8,<<<7) thereby obtaining an intermediate variable S1;
also, in the first stage pipeline, the computational logic may select the state word E selected by the first set of selectors 810 i Performing a circular left shift operation (e.g., a circle of value 7 as shown in FIG. 8)The operation of shifting the left of the loop,<<<7) obtaining an intermediate variable S2; also, the computational logic may select the state word G selected by the first group selector 810 i 、F i And E i Carrying out GG j Calculating; GG (GG) j The result of the operation, the status word H selected by the first group selector 810 i And a message word W j A third 3-input 2-output CSA unit (for example, CSA32_3 shown in fig. 8) is input and a third CSA operation is performed;
in the first stage pipeline, the computation logic may send the third CSA operation result of the intermediate variables S1, S2, and CSA32_3 to the first 4-input 2-output CSA unit (for example, CSA42_1 shown in fig. 8) to perform a fourth CSA operation; the fourth CSA operation result of the CSA42_1 is stored in the register of the second-stage flow;
meanwhile, in the first stage of pipelining, the computation logic may send the intermediate variables S1 and S2 to a 32-bit second adding unit (e.g., the adapter 32_2 shown in fig. 8) to obtain an intermediate variable SS 1; intermediate variables SS1 are held in registers in the second stage pipeline.
In the second stage of pipelining, the computation logic may send the result of the fourth CSA operation of CSA42_1 to a 32-bit third addition unit (e.g., an adapter 32_3 shown in fig. 8) for performing a third addition operation to obtain an intermediate variable TT 2; passing the intermediate variable TT2 through P 0 Function to obtain a status word E i+1 (ii) a At this time E i+1 The flow needs to return to the first stage of flowing water through a bypass;
in the second stage of pipelining, the computational logic may pipeline the intermediate variable SS1, state word A i Exclusive or (XOR) the result of the cyclic left shift operation to obtain an intermediate variable SS 2; the intermediate variables SS2 and the first CSA operation result of CSA32_1 are fed into a fourth 3-input 2-output CSA unit (for example, CSA32_4 shown in fig. 8) to perform a fifth CSA operation to obtain intermediate state words As and Ac; the intermediate status words As and Ac are returned to the first stage pipeline, and the intermediate status words As and Ac are sent to a 32-bit fourth adding unit (e.g., the Adder32_4 shown in fig. 8) for fourth addition operation, so As to obtain an intermediate variable TT 1; thus, the intermediate variable TT1 is processed based on the left assignment operator to obtain the status word A i+1 ;A i+1 Bypassing the return to the first stage of the flow.
The above illustrates that the computation logic 730 selects the state word A during the jth round of computation i To H i Example procedure for performing round calculation, when performing the next round calculation of the jth round, the embodiment of the present application may select the status word a accordingly based on the bypass mechanism i+1 To H i+1 And performing round calculation, wherein the implementation process of the round calculation can be similarly adjusted by referring to the description of the corresponding part in the foregoing, and only the message word and the message parameter are adapted to be adjusted.
It can be seen that for the calculation of the state words a and E, the embodiment of the present application may use the intermediate state word a for obtaining the state word a S And A C Return to the first stage pipeline, then A S And A C Each cycle left shifted by 12 bits and then neutralized in a third set of selectors 830 (a) i 0) selecting; if the next round of round calculation is to be performed, bypass is enabled and the third set of selectors 830 selects (A) S ,A C ) For round counting, otherwise (A) is selected i 0) for round robin calculations; selection result of third group selector 830 and Tj<<<The result of j is compressed by CSA32_2, and then the compression result of CSA32_2 is sent to the Adder32_ 1; the result of the Adder32_1 is recycled to be shifted left by 7 bits, obtaining an intermediate variable S1; on the other hand, the first group selector 810 is on pair E i+1 And E i After the first selection, the selection result is circularly shifted to the left by 7 bits to obtain an intermediate variable S2;
further, the two output results from S1, S2 and CSA32_3 are fed into CSA42_1 to generate two compression results; two compression results generated by CSA42_1 are stored in FX2 register, then passed through Adder32_3 and P 0 A function, obtaining the calculation result of the state word E in one round;
meanwhile, S1 and S2 generate an intermediate variable SS1 through an Adder32_2, and the intermediate variable SS1 is stored in an FX2 register; then in the second stage, the intermediate variables SS1 and A i <<<12, obtaining an intermediate variable SS2 through an exclusive or operation (XOR) of 32 bits; the result of CSA32_1 operation stored in the intermediate variables SS2 and FX2 registers is compressed by CSA32_4 to generate A S And A C ;A S And A C The first stage pipeline is bypassed and the Adder32_4 is input at the same time, resulting in the calculation result of the state word a in one round.
The processing scheme of the SM3 algorithm provided in the embodiment of the present application can perform round calculation on a vector granularity having the same bit width as a round calculation result (for example, perform round calculation on a vector granularity of 256 bits), and can fully utilize the bit width of an operand register of a vector component to improve the processing speed of the SM 3. Further, for the wheel calculation of the SM3 algorithm, because data dependency exists between the wheel calculation of the front wheel and the wheel calculation of the rear wheel of the SM3 algorithm, the wheel calculation can only be executed in series, and the calculation throughput rate is not high; therefore, the embodiment of the application further provides a microstructure design supporting internal bypass, the state words and the intermediate state words generated by the second-stage flow of the calculation of one round are returned to the first-stage flow through the bypass, and data correlation between the calculation of the front and rear rounds is eliminated, so that the continuous round calculation of the SM3 algorithm can be executed in a pipelined manner, and besides the round calculation of the first round, the embodiment of the application can achieve the throughput rate of completing the calculation of one round in one beat in the round calculation of other rounds, thereby further improving the processing speed of the SM3 algorithm and improving the calculation performance of the SM3 algorithm.
Further, an embodiment of the present application also provides a chip, where the chip may include the processor provided in the embodiment of the present application.
Further, an electronic device, such as a terminal device or a server device, may include the processor provided in this embodiment of the present application.
While various embodiments have been described above in connection with what are presently considered to be the embodiments of the disclosure, the various alternatives described in the various embodiments can be readily combined and cross-referenced without conflict to extend the variety of possible embodiments that can be considered to be the disclosed and disclosed embodiments of the disclosure.
Although the embodiments of the present application are disclosed above, the present application is not limited thereto. Various changes and modifications may be effected by one skilled in the art without departing from the spirit and scope of the application, and the scope of protection is defined by the claims.

Claims (26)

1. A method for processing an SM3 algorithm, comprising:
reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand comprises a plurality of message words and message parameters, and a bit width of the first source operand is the same as a bit width of a round of computation results;
according to a single round calculation instruction, calculating a state word of a next state by using a state word of a previous state and a corresponding message word and a corresponding message parameter of a current round;
and storing the state word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of the round calculation result.
2. The method of claim 1, wherein the current round of computations is a jth round of computations, the first source operand comprising a plurality of message words and message parameters comprises:
the first source operand comprising a message word W j ,W j+1 ,W j+2 ,W j+3 And a message parameter W j ’,W j+1 ’,W j+2 ’,W j+3 ’;
The reading of the message word and the message parameter corresponding to the current round of calculation from the first source operand of the first source operand register comprises:
reading the message word W corresponding to the j round calculation from the first source operand j And a message parameter W j ’。
3. The method of claim 2, wherein the status word comprises a, B, C, D, E, F, G, H; the state word of the next state obtained by the calculation of the jth round comprises the following steps: a. the i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 (ii) a The state word of the previous state comprises: a. the i ,B i ,C i ,D i ,E i ,F i ,G i ,H i
4. The method of claim 3, wherein calculating the state word for the next state according to a single round of calculation instruction using the state word for the previous state and the corresponding message word and message parameters for the current round of calculation comprises:
according to the single-wheel calculation instruction, utilizing the state word A of the previous state i ,B i ,C i ,D i ,E i ,F i ,G i ,H i And a message word W j And a message parameter W j ', calculate the state word A for the next state i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1
Said storing said state word in said next state in said first destination operand of said first destination operand register comprises:
state word A of the next state i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 Is stored in a first destination operand of a first destination operand register.
5. The method of claim 1, further comprising:
reading a message word for message expansion from a second source operand of the second source operand register and a third source operand of the third source operand register; the bit width of the second source operand and the bit width of the third source operand are the same as the bit width of the round calculation result, the second source operand comprises the message words with the largest obtained serial number and the first number, and the third source operand comprises the message words with the first number before the second source operand;
expanding a first number of message words after generating a second source operand using the read message words according to a single message expansion instruction;
storing the message word generated by the expansion in a second destination operand of a second destination operand register; wherein the bit width of the second destination operand is the same as the bit width of the round calculation result.
6. The method of claim 5, wherein the second source operand comprising a first number of message words having a largest derived sequence number comprises: the second source operand comprises 8 message words with the largest obtained sequence number;
the third source operand comprising a first number of message words preceding the second source operand comprising: the third source operand comprises 8 message words preceding the second source operand;
expanding the first number of message words after generating the second source operand using the read message word according to a single message expansion instruction includes:
and expanding 8 message words after the second source operand by using the read message word according to the message expansion instruction.
7. The method of claim 4, further comprising:
returning the state word of the next state generated by the current wheel calculation from the second-stage flowing water to the first-stage flowing water;
if the next round of calculation of the current round is carried out, selecting the returned state word of the next state in the first-stage flowing water as the state word used by the next round of calculation; and if the current round of calculation is carried out, selecting the state word of the last state in the first-stage flowing water as the state word used by the current round of calculation.
8. The method of claim 7, further comprising:
returning the intermediate state words As and Ac obtained in the second-stage flowing water to the first-stage flowing water through a bypass; the intermediate state words As and Ac are used to directly produce a round of computation of the state word a;
if the calculation of the next round of the j round is carried out, selecting a cyclic left shift operation result of the intermediate state characters As and Ac in the first-stage flowing water for the calculation of the next round; if the current wheel calculation is carried out, selecting a state character A in the first-stage flowing water i And the sum of the left shift operation result and 0 is used for the current wheel calculation.
9. The method of claim 8, wherein the state word A of a previous state is utilized in accordance with the single round computation instruction i ,B i ,C i ,D i ,E i ,F i ,G i ,H i And a message word W j And a message parameter W j ', calculate the state word A for the next state i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 The method comprises the following steps:
according to the single-round calculation instruction, calculating in the current round, and obtaining a state word generated by the current round calculation based on the selected state word, the corresponding message word and the message parameter;
the method further comprises the following steps:
and according to the single-round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated by the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
10. The method of claim 9, wherein obtaining the status word generated by the calculation of the current round based on the selected status word, the corresponding message word and the message parameter in the calculation of the current round according to the instruction of the calculation of the single round comprises:
aiming at the j round calculation, in the first-stage pipelining, based on a left assignment operator, the selected A is subjected to i 、C i 、E i And G i Respectively treated to obtain B in the second stage flowing water i+1 、D i+1 、F i+1 And H i+1 (ii) a And, in the first stage of the pipeline, for selected B i Performing a circular left shift operation to obtain C in the second stage pipeline i+1 (ii) a And, in the first stage of the pipeline, for selected F i Performing a circular left shift operation to obtain G in the second stage pipeline i+1
And, selected according to the first stage of the pipeline i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', in the second stage the pipeline generates a status word A i+1 ,E i+1
11. The method of claim 10, wherein a is selected according to the first stage pipeline i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', in the second stage the running water produces a status word A i+1 ,E i+1 The method comprises the following steps:
in the first stage of pipelining, according to the selected state word A i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', determining intermediate data for calculating intermediate variables TT1 and TT 2;
in the second stage of the pipeline, according to the intermediate data, an intermediate variable TT1 is calculated, and a state word A is determined according to the intermediate variable TT1 i+1 (ii) a And, in the second stage of the pipeline, calculating an intermediate variable TT2 from said intermediate data and determining a status word E from the intermediate variable TT2 i+1
12. The method of claim 11, wherein the first stage pipeline is pipelined according to a selected state word a i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', determining intermediate data for calculating intermediate variables TT1 and TT2 includes:
in the first stage of pipelining, the selected state word A i 、B i And C i To perform FF j Calculating; according to FF j Operation result, selected status word D i And a message parameter W j ' performing a first CSA operation;
and, according to the selected A i The result of the cyclic left shift operation, 0, and T j Performing second CSA operation on the result of the cyclic left shift operation, performing first addition operation on the result of the second CSA operation, and performing cyclic left shift operation on the result of the first addition operation to obtain an intermediate variable S1;
and, a status word E to be selected i Performing cyclic left shift operation to obtain an intermediate variable S2; for the selected status word G i 、F i And E i Performing GG j Operation according to GG j Operation result, selected status word H i And a message word W j Performing a third CSA operation;
performing fourth CSA operation on the third CSA operation result, the intermediate variables S1 and S2 to obtain a fourth CSA operation result; the intermediate variables S1 and S2 are subjected to a second addition operation to obtain the intermediate variable SS 1.
13. The method of claim 12, wherein in the second stage, the pipeline calculates an intermediate variable TT1 based on the intermediate data, and determines the state word A based on the intermediate variable TT1 i+1 The method comprises the following steps:
intermediate variable SS1, state word A i Performing exclusive-or operation on the cyclic left shift operation result to obtain an intermediate variable SS 2; performing fifth CSA operation according to the intermediate variable SS2 and the first CSA operation result to obtain intermediate state words As and Ac; performing fourth addition operation on the intermediate state words As and Ac to obtain an intermediate variable TT 1; processing the intermediate variable TT1 based on the left assignment operatorThen, obtaining the status word A i+1
The second stage pipeline calculates an intermediate variable TT2 according to the intermediate data, and determines a state word E according to an intermediate variable TT2 i+1 The method comprises the following steps:
performing third addition operation on the fourth CSA operation result in the second-stage flowing water to obtain an intermediate variable TT 2; passing the intermediate variable TT2 through P 0 Function to obtain a status word E i+1
14. A processor, comprising: a round calculation unit, and an operand register; the operand register includes: a first source operand register and a first destination operand register; wherein the first source operand register sets a first source operand, the first source operand including a plurality of message words and message parameters, and a bit width of the first source operand being the same as a bit width of a round of computation result; the first destination operand register is used for setting a first destination operand, and the bit width of the first destination operand is the same as that of the round calculation result;
the processor is configured with a single round of computation instructions; the round calculation unit is used for reading a message word and a message parameter corresponding to the current round calculation from a first source operand of the first source operand register; according to a single round calculation instruction, calculating a state word of a next state by using a state word of a previous state and a corresponding message word and a corresponding message parameter of a current round; the state word for the next state is stored in a first destination operand of a first destination operand register.
15. The processor of claim 14, wherein the current round of computations is a jth round of computations, the first source operand comprising a plurality of message words and message parameters comprises:
the first source operand comprising a message word W j ,W j+1 ,W j+2 ,W j+3 And a message parameter W j ’,W j+1 ’,W j+2 ’,W j+3 ’;
The round calculation unit is used for reading a message word and a message parameter corresponding to the current round calculation from a first source operand of a first source operand register, and comprises:
reading the message word W corresponding to the j round calculation from the first source operand j And a message parameter W j ’。
16. The processor of claim 15, wherein the status word comprises a, B, C, D, E, F, G, H; the state word of the next state obtained by the calculation of the jth round comprises the following steps: a. the i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 (ii) a The state word of the previous state includes: a. the i ,B i ,C i ,D i ,E i ,F i ,G i ,H i
The round calculation unit is configured to calculate, according to a single round calculation instruction, a corresponding message word and a corresponding message parameter by using the state word of the previous state and the current round, and the calculating of the state word of the next state includes:
according to the single-wheel calculation instruction, utilizing the state word A of the previous state i ,B i ,C i ,D i ,E i ,F i ,G i ,H i And a message word W j And a message parameter W j ', calculate the state word A for the next state i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1
17. The processor of claim 14, further comprising: a message extension unit; the operand register further includes: a second source operand register, a third source operand register, and a second destination operand register;
wherein the second source operand register sets a second source operand and the third source operand register sets a third source operand; the bit width of the second source operand and the bit width of the third source operand are the same as the bit width of the round calculation result, the second source operand comprises the message words with the largest obtained serial number and the third source operand comprises the message words with the first number before the second source operand;
the processor is configured with a single message expansion instruction; the message extension unit is used for reading a message word for message extension from a second source operand of the second source operand register and a third source operand of the third source operand register; expanding and generating a first number of message words behind a second source operand by using the read message words according to a single message expansion instruction; the message word generated by the extension is stored in a second destination operand register.
18. The processor of claim 16, wherein the round calculation unit comprises:
the bypass unit is used for returning the state word of the next state generated by the current wheel calculation from the second-stage flowing water to the first-stage flowing water;
the multi-group selector is used for selecting the returned state word in the next state in the first-level flowing water as the state word used for the next round of calculation if the next round of calculation of the current round is carried out; if the current round of wheel calculation is carried out, selecting the state word of the last state in the first-stage flowing water as the state word used by the current round of wheel calculation;
the calculation logic is used for calculating in the current round according to the single-round calculation instruction, and obtaining the state word generated by the current round calculation based on the state word selected by the multiple groups of selectors, the corresponding message word and the message parameter; and according to the single-round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated by the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
19. The processor of claim 18, wherein the bypass unit is further configured to return intermediate state words As and Ac obtained in the second stage of the flow stream to the first stage of the flow stream; the intermediate state words As and Ac are used to directly produce a round of computation of the state word a;
the multiple groups of selectors are further used for selecting a cyclic left shift operation result of the intermediate state words As and Ac in the first-stage flowing water for the next round of calculation if the next round of calculation of the jth round is carried out; if the current wheel calculation is carried out, selecting a state character A in the first-stage flowing water i And the sum of the left shift operation result and 0 is used for the current wheel calculation.
20. The processor of claim 19, wherein the plurality of sets of selectors comprises:
a first group of selectors comprising a plurality of selectors for selecting the returned status word H in the first stage of the flowing water when the next round of calculation of the j-th round is performed i+1 、G i+1 、F i+1 And E i+1 As the status word used in the next round of calculation; selecting a state word H in the first-stage flowing water when the current round of calculation is carried out i 、G i 、F i And E i As a status word used for current round of calculation;
a second group of selectors comprising a plurality of selectors for selecting the returned status word A in the first-stage flowing water when the next round of calculation of the j-th round is performed i+1 、B i+1 、C i+1 And D i+1 As the status word for the next round of calculation; selecting a state word A in the first-stage flowing water when the current round of calculation is carried out i 、B i 、C i And D i As a status word used for current round of calculation;
a third group of selectors, which comprise a plurality of selectors, wherein the selectors select the returned cyclic left shift operation results of the intermediate state words As and Ac in the first-stage flowing water for the next round of calculation when the next round of calculation of the jth round is performed; selecting a status word in the first-stage flowing water when the current wheel calculation is carried outA i And the sum of the left shift operation result and 0 is used for the current wheel calculation.
21. The processor of claim 20, wherein the computation logic, configured to obtain the state word generated by the current round of computation based on the plurality of sets of state words selected by the selector, the corresponding message word, and the message parameter in the current round of computation according to a single round of computation instruction, comprises:
aiming at the j round calculation, in the first-stage pipelining, based on a left assignment operator, the selected A is subjected to i 、C i 、E i And G i Respectively processing the two signals, and latching the processing result for one beat in the second-stage pipeline to obtain B i+1 、D i+1 、F i+1 And H i+1 (ii) a And, in the first stage of the pipeline, for selected B i Performing a circular left shift operation to obtain C in the second stage pipeline i+1 (ii) a And, in the first stage of the pipeline, for selected F i Performing a circular left shift operation to obtain G in the second stage pipeline i+1
And, selected according to the first stage of the pipeline i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', in the second stage the pipeline generates a status word A i+1 ,E i+1
22. The processor of claim 21, wherein the computational logic is configured to select a according to the first stage pipeline i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', in the second stage the pipeline generates a status word A i+1 ,E i+1 The method comprises the following steps:
in the first stage of pipelining, according to the selected state word A i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', determining intermediate data for calculating intermediate variables TT1 and TT 2; in the second stage of the pipeline, according to the intermediate data, an intermediate variable TT1 is calculated, and a state word A is determined according to the intermediate variable TT1 i+1 (ii) a And, in the second stage of the pipeline, calculating an intermediate variable TT2 from said intermediate data and determining a status word E from the intermediate variable TT2 i+1
23. The processor of claim 22, wherein the computational logic is configured to pipeline in a first stage according to the selected state word a i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And a message parameter W j ', determining intermediate data for calculating intermediate variables TT1 and TT2 includes:
in the first stage of pipelining, the state words A selected by the second set of selectors i 、B i And C i To perform FF j Calculating; will FF j The result of the operation, the status word D selected by the second set of selectors i And a message parameter W j ' the first 3-input 2-output CSA unit is sent to perform the first CSA operation;
and A selected by the third group selector i The result of the cyclic left shift operation of (3), 0, and T j Sending the cyclic left shift operation result into a second 3-input 2-output CSA unit for second CSA operation; inputting the second CSA operation result into a 32-bit first addition unit to perform first addition operation; performing cyclic left shift operation on the first addition operation result to obtain an intermediate variable S1;
and a status word E selected by the first set of selectors i Performing cyclic left shift operation to obtain an intermediate variable S2; status word G to be selected by a first group of selectors i 、F i And E i Carrying out GG j Calculating; GG is added j Operation result, message word W j And a first set of state words H selected by the selector i Sending the CSA unit to the third 3-input 2-output unit for processingA third CSA operation; then, the intermediate variables S1 and S2 and the third CSA operation result are fed into the first 4-input 2-output CSA unit, and a fourth CSA operation is performed;
and the intermediate variables S1 and S2 are sent to a 32-bit second addition unit to be subjected to second addition operation, so that the intermediate variable SS1 is obtained.
24. The processor of claim 23, wherein said computing logic is configured to, in a second stage of pipelining, compute an intermediate variable TT1 based on said intermediate data, and determine the state word a based on the intermediate variable TT1 i+1 The method comprises the following steps:
in the second stage of pipelining, the intermediate variable SS1, the state word A selected by the second set of selectors i Performing exclusive-or operation on the cyclic left shift operation result to obtain an intermediate variable SS 2; the intermediate variable SS2 and the first CSA operation result are sent into a fourth 3-input 2-output CSA unit to carry out fifth CSA operation so As to obtain intermediate state words As and Ac; sending the intermediate state words As and Ac into a fourth addition unit with 32 bits for fourth addition operation to obtain an intermediate variable TT 1; based on left assignment operator, processing the intermediate variable TT1 to obtain a status word A i+1
The computation logic is used for computing an intermediate variable TT2 according to the intermediate data and determining a state word E according to the intermediate variable TT2 in the second-stage pipeline i+1 The method comprises the following steps:
sending the fourth CSA operation result into a 32-bit third addition unit for third addition operation to obtain an intermediate variable TT 2; passing the intermediate variable TT2 through P 0 Function to obtain a status word E i+1
25. A chip comprising a processor as claimed in any one of claims 14 to 24.
26. An electronic device comprising the chip of claim 25.
CN202210493013.1A 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment Active CN114978473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210493013.1A CN114978473B (en) 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210493013.1A CN114978473B (en) 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN114978473A true CN114978473A (en) 2022-08-30
CN114978473B CN114978473B (en) 2024-03-01

Family

ID=82981765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210493013.1A Active CN114978473B (en) 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN114978473B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116318660A (en) * 2023-01-12 2023-06-23 成都海泰方圆科技有限公司 Message expansion and compression method and related device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430737A (en) * 2008-11-19 2009-05-13 西安电子科技大学 Wavelet transformation-improved VLSI structure design method
US20160092688A1 (en) * 2014-09-26 2016-03-31 Gilbert M. Wolrich Instructions and logic to provide simd sm3 cryptographic hashing functionality
CN106575215A (en) * 2014-09-04 2017-04-19 英特尔公司 Emulation of fused multiply-add operations
CN108427575A (en) * 2018-02-01 2018-08-21 深圳市安信智控科技有限公司 Fully pipelined architecture SHA-2 extension of message optimization methods
CN112367158A (en) * 2020-11-06 2021-02-12 海光信息技术股份有限公司 Method for accelerating SM3 algorithm, processor, chip and electronic equipment
CN113076277A (en) * 2021-03-26 2021-07-06 大唐微电子技术有限公司 Method and device for realizing pipeline scheduling, computer storage medium and terminal
CN113282947A (en) * 2021-07-21 2021-08-20 杭州安恒信息技术股份有限公司 Data encryption method and device based on SM4 algorithm and computer platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430737A (en) * 2008-11-19 2009-05-13 西安电子科技大学 Wavelet transformation-improved VLSI structure design method
CN106575215A (en) * 2014-09-04 2017-04-19 英特尔公司 Emulation of fused multiply-add operations
US20160092688A1 (en) * 2014-09-26 2016-03-31 Gilbert M. Wolrich Instructions and logic to provide simd sm3 cryptographic hashing functionality
CN108427575A (en) * 2018-02-01 2018-08-21 深圳市安信智控科技有限公司 Fully pipelined architecture SHA-2 extension of message optimization methods
CN112367158A (en) * 2020-11-06 2021-02-12 海光信息技术股份有限公司 Method for accelerating SM3 algorithm, processor, chip and electronic equipment
CN113076277A (en) * 2021-03-26 2021-07-06 大唐微电子技术有限公司 Method and device for realizing pipeline scheduling, computer storage medium and terminal
CN113282947A (en) * 2021-07-21 2021-08-20 杭州安恒信息技术股份有限公司 Data encryption method and device based on SM4 algorithm and computer platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116318660A (en) * 2023-01-12 2023-06-23 成都海泰方圆科技有限公司 Message expansion and compression method and related device
CN116318660B (en) * 2023-01-12 2023-12-08 成都海泰方圆科技有限公司 Message expansion and compression method and related device

Also Published As

Publication number Publication date
CN114978473B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
US7395304B2 (en) Method and apparatus for performing single-cycle addition or subtraction and comparison in redundant form arithmetic
Wang et al. FPGA implementation of a large-number multiplier for fully homomorphic encryption
CN111612622B (en) Circuit and method for performing a hashing algorithm
JP4612680B2 (en) Apparatus and method for performing MD5 digesting
US6754689B2 (en) Method and apparatus for performing subtraction in redundant form arithmetic
CN112367158A (en) Method for accelerating SM3 algorithm, processor, chip and electronic equipment
CN115344237A (en) Data processing method combining Karatsuba and Montgomery modular multiplication
CN114968173A (en) Polynomial multiplication method and polynomial multiplier based on NTT and INTT structures
CN115525342A (en) Acceleration method of SM3 password hash algorithm and instruction set processor
CN114978473A (en) Processing method of SM3 algorithm, processor, chip and electronic equipment
JP2002229445A (en) Modulator exponent device
CN213518334U (en) Circuit for executing Hash algorithm, computing chip and encrypted currency mining machine
CN114553424B (en) ZUC-256 stream cipher light weight hardware system
CN112988235B (en) Hardware implementation circuit and method of high-efficiency third-generation secure hash algorithm
CN101986261A (en) High-speed radix-16 Montgomery modular multiplier VLSI (very large scale integrated circuit)
CN104951279A (en) Vectorized Montgomery modular multiplier design method based on NEON engine
CN213482935U (en) Circuit for executing Hash algorithm, computing chip and encrypted currency mining machine
KR102587719B1 (en) Circuits, computing chips, data processing devices and methods for performing hash algorithms
Stefan Analysis and implementation of eSTREAM and SHA-3 cryptographic algorithms
Seo et al. All the polynomial multiplication you need on RISC-V
CN114648318A (en) Circuit for executing hash algorithm, computing chip, encrypted currency mining machine and method
CN212231468U (en) Circuit for executing hash algorithm and device for executing bitcoin mining algorithm
CN116155481A (en) SM3 algorithm data encryption realization method and device
Tervo Development of a GF (2) math coprocessor
Lam et al. An Improved Hardware Architecture of Ethereum Blockchain Hashing System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant