CN114978473B - SM3 algorithm processing method, processor, chip and electronic equipment - Google Patents

SM3 algorithm processing method, processor, chip and electronic equipment Download PDF

Info

Publication number
CN114978473B
CN114978473B CN202210493013.1A CN202210493013A CN114978473B CN 114978473 B CN114978473 B CN 114978473B CN 202210493013 A CN202210493013 A CN 202210493013A CN 114978473 B CN114978473 B CN 114978473B
Authority
CN
China
Prior art keywords
word
message
calculation
round
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210493013.1A
Other languages
Chinese (zh)
Other versions
CN114978473A (en
Inventor
姚涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202210493013.1A priority Critical patent/CN114978473B/en
Publication of CN114978473A publication Critical patent/CN114978473A/en
Application granted granted Critical
Publication of CN114978473B publication Critical patent/CN114978473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/12Details relating to cryptographic hardware or logic circuitry
    • H04L2209/125Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Power Engineering (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)

Abstract

The embodiment of the application provides a processing method, a processor, a chip and electronic equipment of an SM3 algorithm, wherein the method comprises the following steps: reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand includes a plurality of message words and message parameters, and the bit width of the first source operand is the same as the bit width of the round calculation result; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; and storing the status word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of a round calculation result. The embodiment of the application can improve the processing speed of the SM3 algorithm.

Description

SM3 algorithm processing method, processor, chip and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of cryptography, in particular to a processing method, a processor, a chip and electronic equipment of an SM3 algorithm.
Background
The SM3 algorithm is a cryptographic hash function standard that can be applied to lengths less than 2 64 A bit (bit) message produces a 256-bit hash value, which may be a message digest (bit string) that is output when a hash algorithm is applied to a message. The SM3 algorithm is essentially a password hash algorithm, and can be used for the scenes requiring password security, such as digital signature and verification, generation and verification of a message authentication code, random number generation and the like in commercial password application. Based on the wide application of the SM3 algorithm, how to improve the processing speed of the SM3 algorithm becomes a problem to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of this, the embodiments of the present application provide a processing method, a processor, a chip and an electronic device for an SM3 algorithm, so as to improve the processing speed of the SM3 algorithm.
In order to achieve the above purpose, the embodiments of the present application provide the following technical solutions.
In a first aspect, an embodiment of the present application provides a processing method of an SM3 algorithm, including:
reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand includes a plurality of message words and message parameters, and the bit width of the first source operand is the same as the bit width of the round calculation result;
According to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state;
and storing the status word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of a round calculation result.
In a second aspect, embodiments of the present application provide a processor, comprising: a round calculation unit and an operand register; the operand register includes: a first source operand register and a first destination operand register; wherein the first source operand register sets a first source operand, the first source operand including a plurality of message words and message parameters, and the first source operand having a bit width that is the same as a bit width of a round calculation result; the first destination operand register is used for setting a first destination operand, and the bit width of the first destination operand is the same as that of a round calculation result;
the processor is configured with a single wheel calculation instruction; the round calculation unit is used for reading a message word and a message parameter corresponding to the current round calculation from a first source operand of the first source operand register; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; the status word of the next state is stored in a first destination operand of a first destination operand register.
In a third aspect, embodiments of the present application provide a chip including a processor as described above.
In a fourth aspect, embodiments of the present application provide an electronic device including a chip as described above.
According to the embodiment of the application, on the basis of setting the first source operand register and the first destination operand register with the same bit width as that of the round calculation result, a plurality of message words and a plurality of message parameters for round calculation can be stored through a first source operand in the first source operand register, and a calculation result of each round calculation is stored through a first destination operand in the first destination operand register, so that round calculation of the SM3 algorithm is established on the source operand and the destination operand with the same bit width as that of the round calculation result. Furthermore, when performing the current round of calculation, the embodiment of the application may read the message word and the message parameter corresponding to the current round of calculation from the first source operand of the first source operand register; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; the status word of the next state is stored in the first destination operand of the first destination operand register. Therefore, under the condition that the source operand register and the destination operand register with the same bit width as that of the round calculation result are set, the embodiment of the application can fully utilize the bit width of the operand register on the vector granularity same as that of the round calculation result, and each round of calculation is realized through a single round calculation instruction, so that the round calculation speed of the SM3 algorithm is improved, and the processing speed of the SM3 algorithm is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
Fig. 1 is a diagram of an example processor implementation of the SM3 algorithm for 128-bit operand registers.
Fig. 2 is a diagram illustrating an example implementation of a processor of the SM3 algorithm provided in an embodiment of the present application.
Fig. 3 is a flowchart of a processing method of the SM3 algorithm provided in the embodiment of the present application.
Fig. 4 is a flowchart of another processing method of the SM3 algorithm provided in the embodiment of the present application.
FIG. 5 is a diagram of an example of hardware for performing round computation according to an embodiment of the present application.
Fig. 6A is a flowchart of another processing method of the SM3 algorithm according to an embodiment of the present application.
Fig. 6B is a flowchart of a method for calculating a status word by using the SM3 algorithm according to an embodiment of the present application.
Fig. 7 is an exemplary diagram of a wheel calculation unit provided in an embodiment of the present application.
Fig. 8 is a diagram of another hardware example of performing round computation according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The SM3 algorithm can be used to length less than 2 64 And (3) filling the message with bits to form a message with the length being multiple of 512 bits, grouping the message into a plurality of 512-bit message blocks, and then sequentially carrying out iterative compression processing on each 512-bit message block respectively to output 256-bit hash values corresponding to each message block. In the SM3 algorithm, the iterative compression process of message blocks mainly involves message expansion and round computation.
During message expansion, the message block may be divided into initial W 0 To W 15 Is then expanded by message expansion to generate W 0 To W 67 68 message words of (2) and W 0 ' to W 63 ' 64 message parameters. In the process of expanding the generated message parameters, the message word W may be used i And W is i+4 Obtaining the message parameter W i ', e.g. W i ’=W i ⊕W i+4 I.epsilon.0 to 63; in the process of generating message word by extension, message word W 16 To W 67 Can be used for the initial 16 message words W 0 To W 15 Is generated by expanding the obtained message word, for example:
W m ←P 1 (W m-16 ⊕W m-9 ⊕(W m-3 <<<15))⊕(W m-13 <<<7)⊕W m-6 m.epsilon.16 to 67.
During the round computation, the SM3 algorithm can compute from the initial 8 state words (a 0 、B 0 、C 0 、D 0 、E 0 、F 0 、G 0 、H 0 ) Firstly, adding corresponding message words and message parameters in each round, so as to obtain 8 state words output by a final round through round calculation of multiple rounds of iteration; the 8 state words of the final round output may form a 256-bit hash value. Wherein the message word and message parameter corresponding to the current round are used in one round of calculation.
For each round of computation, the processor needs to read the source operand for the round of computation (e.g., the message word and message parameter corresponding to the current round of computation) from the source operand register and write the computed state word to the destination operand register. Based on the 128-bit operand registers of the processor, the round computation of SM3 is performed at a vector granularity of 128 bits, which results in that the operand bit width does not correspond to the 256-bit computation result of the round computation, so that the processing speed of SM3 is limited. That is, each round of computation of SM3 needs to be performed by a plurality of rounds of computation instructions due to operand bit width limitation of operand registers (source operand register and destination operand register), resulting in limited processing speed of SM 3.
For ease of illustration, FIG. 1 illustrates an exemplary diagram of a processor implementation of an SM3 algorithm based on 128-bit operand registers, as shown in FIG. 1, the processor's instruction set including an SM3 two-round four state word update instruction 111 and an SM3 two-round four remaining state word update instruction 112 for round computation; and, the operand registers of the processor include: source operand register 121 and source operand register 122 having a bit width of 128 bits, and destination operand register 131 and destination operand register 132 having a bit width of 128 bits;
in each round of computation, 128 bits of source operand register 121 may hold 4 message words for 4 rounds and 128 bits of source operand register 122 may hold 4 message parameters for 4 rounds based on 32 bits of message words and message parameters. Meanwhile, since the bit width of one destination operand register is 128 bits, and the calculation result of one round of calculation is 256 bits (i.e., a hash value of 256 bits is formed based on 8 state words, and one state word is 32 bits), the processor needs to complete the state word calculation of the current round through two rounds of calculation instructions. For example, the SM3 two-round four status word update instruction 111 calculates 4 status words of the current round based on the message words and message parameters stored in the source operand register 121 and the source operand register 122, and stores them in the destination operand register 131; the SM3 two-round four remaining state word update instruction 112 calculates the remaining 4 state words of the current round based on the message words and message parameters stored in the source operand register 121 and the source operand register 122, and stores them in the destination operand register 132.
It can be seen that under the bit width constraint of the 128-bit operand register, the round computation of SM3 is performed at a vector granularity of 128 bits, which results in one round of computation requiring splitting into multiple rounds of computation instruction execution, resulting in a limited processing speed of the SM3 algorithm. Meanwhile, with the development of high-performance processors, vector components can provide more operands and parallel computation, and if SM3 computation is performed on a 128-bit vector granularity, the vector components cannot adapt to the performance development of the processors.
Based on this, the embodiments of the present application provide an improved processing scheme of the SM3 algorithm, and propose a single round calculation instruction based on an operand register (for example, a 256-bit operand register) with the same bit width as the round calculation result, so as to perform round calculation on the same vector granularity (for example, a 256-bit vector granularity) as the bit width of the round calculation result, thereby realizing an increase in the processing speed of the SM3 algorithm. That is, the bit width of the operand register is fully utilized on the basis that the bit width of the operand register corresponds to the bit width of the round calculation result, and each round of calculation is realized by a single round calculation instruction.
As an alternative implementation, taking the example of using 256-bit operand registers by the processor, fig. 2 illustrates an example diagram of a processor implementation of the SM3 algorithm provided in an embodiment of the present application. It should be noted that, the bit width of the operand register may be the same as the bit width of the round calculation result, and based on the current round calculation result being 256 bits, the embodiment of the present application will be described by taking the bit width of the operand register as 256 bits as an example. Referring to fig. 2, the processor may include a round calculation unit 210 for round calculation, and a message extension unit 220 for message extension; and the instruction set of the processor includes: a single round calculation instruction 211 for round calculation (such as the round calculation unit 210 executing the single round calculation instruction 211 to implement round calculation of the SM3 algorithm), and a single message extension instruction 221 for message extension (such as the message extension unit 220 executing the message extension instruction 221 to implement message extension of the SM3 algorithm). The message expansion instruction 221 provided in the embodiment of the present application may also perform message expansion based on 256-bit vector granularity, similar to the round calculation performed by the single round calculation instruction 211 based on 256-bit vector granularity. In one example, single round calculation instruction 211 may be referred to as VSM3RND256 and message extension instruction 221 may be referred to as VSM3MSG256.
As further shown in fig. 2, operand registers may be provided in the processor, the provided operand registers may include: a first source operand register 231, a second source operand register 232, a third source operand register 233, a first destination operand register 241, and a second destination operand register 242; wherein each operand register (including each source operand register and each destination operand register) has the same bit width as the round result, e.g., each operand register has a bit width of 256 bits.
In the present embodiment, the first source operand register 231 and the first destination operand register 241 are used for round computation. As an alternative implementation, the first source operand register 231 may hold a total of 256 bits of message words and message parameters; for example, first source operand register 231 may hold 4 message words (W j ,W j+1 ,W j+2 ,W j+3 ) 4 message parameters (W j ’,W j+1 ’,W j+2 ’,W j+3 '), where j represents the current number of wheels of the wheel calculation, and W j+3 Not exceeding W 67 ,W j+3 ' not exceeding W 63 ’。
In the current round calculation, the round calculation unit 210 may execute the single round calculation instruction 211 to calculate 8 status words of the current round based on the corresponding message words and message parameters of the current round calculation stored in the first source operand register 231, and store the 8 status words generated by the current round calculation in the first destination operand register 241.
The second source operand register 232, the third source operand register 233 and the second destination operand register 242 are used for message expansion. As an alternative implementation, the second source operand register 232 may store the 8 message words with the largest sequence numbers that have been obtained, and the third source operand register 233 may store the 8 message words preceding the message word stored in the second source operand register 232; in performing message expansion, the message expansion unit 220 may execute the message expansion instruction 221 to expand the next 8 message words based on the message words stored in the second source operand register 232 and the third source operand register 233, and store the expanded 8 message words in the second destination operand register 242.
For example, in 16 message words W based on initial partitioning 0 To W 15 The second source operand register 232 may store W during message expansion 8 To W 15 Is a 8 message word (one message word is 32 bits), the third source operand register 233 may hold W 8 To W 15 The first 8 message words W 0 To W 7 Message expansion unit 220 may execute message expansion instruction 221 to store 16 message words W based on second source operand register 232 and third source operand register 233 0 To W 15 The 8 message words W after expansion 16 To W 23 The method comprises the steps of carrying out a first treatment on the surface of the The 8 message words W obtained by expansion 16 To W 23 May be stored in the second destination operand register 242.
Based on the single round calculation instruction and the message expansion instruction provided by the embodiment of the application, the following describes a processing method of the SM3 algorithm provided by the embodiment of the application. As an alternative implementation, fig. 3 illustrates a flowchart of a processing method of the SM3 algorithm provided in an embodiment of the present application, where the method flow may be implemented by a processor executing a single round calculation instruction (e.g., a round calculation unit in the processor executes a single round calculation instruction, implementing the method flow). Referring to fig. 3, the method flow may include the following steps.
In step S31, reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand includes a plurality of message words and message parameters and the bit width of the first source operand is the same as the bit width of the round calculation result.
In the embodiment of the application, in the case of setting the first source operand register with the same bit width as the round calculation result, the first source operand can be set through the first source operand register, and the bit width of the first source operand is the same as the bit width of the round calculation result. For example, in the case of setting a 256-bit-wide first source operand register, a 256-bit-wide first source operand may be set by the first source operand register. The first source operand may be considered a source operand for a round calculation, and may include a plurality of message words and a plurality of message parameters having a total bit width that is the same as a bit width of a round calculation result based on use of the message words and the message parameters in the course of the round calculation. For example, the current round of calculation is the j-th round of calculation, then the first source operand may include 4 message words (W j ,W j+1 ,W j+2 ,W j+3 ) 4 message parameters (W j ’,W j+1 ’,W j+2 ’,W j+3 ’)。
Based on the first source operand stored in the first source operand register, the embodiment of the application can read the message word and the message parameter corresponding to the current round calculation from the first source operand when the current round calculation is performed. As an alternative implementation, embodiments of the present application may operate from a first source based on the current number of wheels of the wheel calculationAnd reading the message word and the message parameter corresponding to the current round number in the number. In one example, the round calculation may be performed for 64 rounds, then at the jth round calculation, embodiments of the present application may read the message word W from the first source operand j And message parameter W j ’。
In step S32, according to the single round calculation instruction, the state word of the next state is calculated by using the state word of the previous state and the corresponding message word and message parameter calculated by the current round.
After the message word and the message parameter corresponding to the current round of calculation are read from the first source operand stored in the first source operand register, the embodiment of the application can calculate the state word of the next state according to the single round of calculation instruction, and the state word of the next state can be regarded as the calculation result of the current round of calculation. Optionally, when calculating the status word of the next state according to the status word of the previous state, the single round calculation instruction may add the message word and the message parameter corresponding to the current round calculation, so as to obtain the calculation result of the current round.
In one example, assuming that the wheel calculation of the jth wheel is currently performed, the state word (A i ,B i ,C i ,D i ,E i ,F i ,G i ,H i ) Calculating corresponding message word W by current round j And message parameter W j ' the embodiment of the application can execute a single round calculation instruction to calculate the state word (A i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ). For example, in the first round of calculation, the state word of the previous state is the initial state word (a 0 ,B 0 ,C 0 ,D 0 ,E 0 ,F 0 ,G 0 ,H 0 ) The calculated status word of the next status is (A 1 ,B 1 ,C 1 ,D 1 ,E 1 ,F 1 ,G 1 ,H 1 ) And so on.
In some embodiments, in the process of executing the single round calculation instruction to calculate the state word of the next state, the embodiments of the present application may calculate the first part of the state word of the next state according to the state word of the previous state; calculating corresponding message words and message parameters based on the state words of the previous state and the current round, and calculating intermediate variables; and further determining a second partial state word for the next state based on the intermediate variable; wherein the state words may be divided into a first partial state word and a second partial state word, and the calculated first partial state word and second partial state word of the next state may form the state word of the next state.
As an optional implementation, the embodiment of the present application may set (B, C, D, E, F, G, H) to be a first part of status words and (a, E) to be a second part of status words, where the next state of the first part of status words (B, C, D, F, G, H) may be determined by the status word of the previous state; for the next state of the second part of state words (a, E), the embodiment of the application may calculate the corresponding message word and message parameter based on the state word of the previous state and the current round, determine the intermediate variable, and then determine the next state of the second part of state words based on the calculated intermediate variable.
In one example, assume that the current number of rounds of round calculation is j (j may be stored in the immediate number imm 8), and the calculation result of the round calculation is (a i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Then based on the last state word (a i ,B i ,C i ,D i ,E i ,F i ,G i ,H i ) Calculating corresponding message word W by current round j And message parameter W j ' the embodiment of the present application may calculate the first partial status word (B) of the next status by executing a single round calculation instruction i+1 ,C i+1 ,D i+1 ,F i+1 ,G i+1 ,H i+1 ) And intermediate variables SS1, SS2, TT1, TT2; further, a second partial state word (A) of the next state is calculated by intermediate variables TT1, TT2 i+1 ,E i+1 )。
As an alternative implementation, the calculation process of the single round calculation instruction may be represented by the following formula, for example:
SS1←((A i <<<12)+E i +(T j <<<j))<<<7;
TT1←FF j (A i ,B i ,C i )+D i +SS2+W j ’;
TT2←GG j (E i ,F i ,G i )+H i +SS1+W j
D i+1 ←C i
C i+1 ←B i <<<9;
B i+1 ←A i
A i+1 ←TT1;
H i+1 ←G i
G i+1 ←F i <<<19;
F i+1 ←E i
E i+1 ←P 0 (TT2)。
wherein,representing a 32-bit exclusive-or operation;<<<representing a cyclic left shift operation; the left assignment operator; t (T) j Taking different values along with the change of j as algorithm constants; p (P) 0 Representing a permutation function in a round calculation, P 0 (X) can be expressed as:FF j and GG j Representing a Boolean function, and taking different expressions along with the change of j; specific:
further, Λ represents a 32-bit AND operation, v represents a 32-bit OR operation,representing a 32 bit non-operation.
In step S33, the status word of the next status is stored in a first destination operand of a first destination operand register, where the bit width of the first destination operand is the same as the bit width of the round calculation result.
After calculating the status word of the next state, the embodiment of the present application may store the status word of the next state in a first destination operand, where the first destination operand is disposed in a first destination operand register, and the bit width of the first destination operand is the same as the bit width of the round calculation result. For example, embodiments of the present application may set a 256-bit wide first destination operand register that may store a 256-bit first destination operand, such that embodiments of the present application calculate a 256-bit status word (a i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) Then, the embodiment of the application can store the status word of the next status in the first destination operand of the first destination operand register, thereby fully utilizing the bit width of the first destination operand register.
In some embodiments, the wheel calculation unit in the processor may execute the single wheel calculation instruction provided in the embodiments of the present application, so as to implement the method flow shown in fig. 3.
According to the embodiment of the application, on the basis of setting the first source operand register and the first destination operand register with the same bit width as that of the round calculation result, a plurality of message words and a plurality of message parameters for round calculation can be stored through a first source operand in the first source operand register, and a calculation result of each round calculation is stored through a first destination operand in the first destination operand register, so that round calculation of the SM3 algorithm is established on the source operand and the destination operand with the same bit width as that of the round calculation result. Furthermore, when performing the current round of calculation, the embodiment of the application may read the message word and the message parameter corresponding to the current round of calculation from the first source operand of the first source operand register; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; the status word of the next state is stored in the first destination operand of the first destination operand register. Therefore, under the condition that the source operand register and the destination operand register with the same bit width as that of the round calculation result are set, the embodiment of the application can fully utilize the bit width of the operand register on the vector granularity same as that of the round calculation result, and each round of calculation is realized through a single round calculation instruction, so that the round calculation speed of the SM3 algorithm is improved, and the processing speed of the SM3 algorithm is improved.
In one implementation example, the source operand for round calculation is ymm1, the destination operand is ymm0, and the current round number j for round calculation is stored in an immediate value imm8; based on the round calculation result being 256 bits, the bit widths of ymm1 and ymm0 may be 256 bits, and set to the 256-bit-wide source operand register and the 256-bit-wide destination operand register, respectively; based on the single round calculation instruction VSM3RND256 provided in the embodiments of the present application, the embodiments of the present application may perform a single round calculation operation on a 256-bit vector granularity based on a 256-bit source operand register and a destination operand register, and a round calculation may be completed by 2 beats. For example, in the j-th round of calculationIn this case, ymm1 stores 4 message words (W j ,W j+1 ,W j+2 ,W j+3 ) And 4 message parameters (W j ’,W j+1 ’,W j+2 ’,W j+3 '), 256 bits total; the single round calculation instruction VSM3RND256 provided in the embodiments of the present application can calculate the j-th round calculation result (a i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) And stored in ymm0.
In further embodiments, the embodiments of the present application may utilize a source operand register and a destination operand register having the same bit width as the bit width of the round calculation result when performing message expansion, thereby completing a round of message word expansion by a single message expansion instruction. For example, embodiments of the present application may complete a round of expansion of 8 message words in 1 beat with a single message expansion instruction. As an alternative implementation, fig. 4 illustrates another processing method flowchart of the SM3 algorithm provided in the embodiment of the present application, where the method flowchart may be implemented by the processor executing a single message expansion instruction (e.g., the message expansion instruction is executed by a message expansion unit in the processor to implement the method flowchart). Referring to fig. 4, the method flow may include the following steps.
In step S41, a message word for message expansion is read from the second source operand of the second source operand register and the third source operand of the third source operand register; the bit widths of the second source operand and the third source operand are the same as the bit width of the round calculation result, the second source operand comprises the first number of message words with the largest obtained sequence number, and the third source operand comprises the first number of message words before the second source operand.
In the case of setting the second source operand register and the third source operand register with the same bit width as the round calculation result, the embodiment of the application can set the second source operand through the second source operand register and set the third source operand through the third source operand register, and the bit widths of the second source operand and the third source operand are the same as the round calculation result. For example, in the case of setting the second source operand and the third source operand which are 256 bits wide, the second source operand and the third source operand of 256 bits may be set by the second source operand and the third source operand, respectively.
The second source operand and the third source operand may be considered as source operands for message expansion. In some embodiments, when performing message word expansion, the embodiments of the present application may expand a first number of message words with the same bit width as the round calculation result in one beat (one beat may be considered as one clock cycle of the processor) by a single message expansion instruction. As an alternative implementation, the second source operand may include the first number of message words with the largest sequence number that have been obtained, and the third source operand may include the first number of message words before the second source operand, so that the embodiments of the present application may utilize the second source operand and the third source operand to expand the first number of message words after the second source operand by a single beat.
In one example, based on the round calculation result being 256 bits, and one message word being 32 bits, the second source operand may include the 8 message words with the largest sequence number obtained, and the third source operand may include the 8 message words before the second source operand. For example, W based on initial partitioning of message blocks 0 To W 15 The second source operand may hold the 8 message words W with the largest sequence number 8 To W 15 The third source operand may hold 8 message words W before the second source operand 0 To W 7 . Based on W 0 To W 15 In the embodiment of the application, through a single message expansion instruction, 8 message words W after one beat expansion are obtained 16 To W 23 . Similarly, the next 8 message words W are expanded 24 To W 31 When the second source operand and the third source operand are storedCan be adjusted accordingly, such as the 8 message words W with the largest sequence numbers of the second source operand 16 To W 23 The 8 message words W before the third source operand is deposited 8 To W 15 For expanding the next 8 message words and so on.
Based on the message words stored in the second source operand and the third source operand, when the message word expansion is performed, the embodiment of the application can read the message words used for message expansion from the second source operand and the third source operand so as to expand the next message word based on the read message words. As an alternative implementation, when one message word is extended, the embodiments of the present application may read a plurality of message words with a specified sequence number before the message word to be extended currently from the second source operand and the third source operand, for example, the 16 th message word, the 13 th message word, the 9 th message word, the 6 th message word and the 3 rd message word before the message word to be extended is read. In one example, assume that the current message word W is to be expanded m W can be read from the second source operand and the third source operand m-16 、W m-13 、W m-9 、W m-6 And W is m-3 A total of 5 message words.
In step S42, the first number of message words after the second source operand is generated is extended with the read message words according to a single message extension instruction.
After the message words for message expansion are read from the second source operand and the third source operand, embodiments of the present application may expand a first number of message words after generating the second source operand according to a single message expansion instruction, so as to complete the expansion of the first number of message words after the second source operand in one beat by the message expansion instruction, thereby completing the expansion of a plurality of message words with the same bit width as the round calculation result in one beat.
In one example, based on the current message word to be expanded, the embodiments of the present application may utilize, after reading a plurality of message words of a specified sequence number preceding the current message word to be expanded from the second source operand and the third source operand, based on the plurality of read message wordsThe message word is extended by the permutation function and the 32-bit exclusive-or operation in the message extension. For example, assume that the message word W is currently to be expanded m W can be read from the second source operand and the third source operand m-16 、W m-13 、W m-9 、W m-6 And W is m-3 The message word W can then be expanded by the following formula m
Wherein P is 1 Representing a permutation function in message extension, message words P for message words X, Y, Z 1 (X) can be expressed as:
in step S43, the message word generated by the extension is stored in the second destination operand of the second destination operand register; wherein the bit width of the second destination operand is the same as the bit width of the round calculation result.
After the first number of message words after the second source operand is generated by extension, the total bit width of the first number of message words generated by extension is the same as the round calculation result.
In some embodiments, the message expansion unit in the processor may execute the message expansion instruction provided in the embodiments of the present application, so as to implement the method flow shown in fig. 4.
According to the embodiment of the application, on the basis of setting the second source operand register, the third source operand register and the second destination operand register with the same bit width as that of the round calculation result, the obtained first number of message words with the largest sequence number can be stored through the second source operand in the second source operand register, the first number of message words before the second source operand is stored through the third source operand in the third source operand register, the first number of message words after the second source operand generated by expansion are stored through the second destination operand in the second destination operand register, and therefore message expansion of the SM3 algorithm is established on the source operand and the destination operand with the same bit width as that of the round calculation result. Furthermore, within a beat of message extension, embodiments of the present application may read a message word for message extension from the second source operand of the second source operand register and the third source operand of the third source operand register; further, according to a single message expansion instruction, utilizing the read message words to expand the first number of message words after completing in one beat; the extended generated message word may be stored in a second destination operand of the second destination operand register. According to the embodiment of the application, the bit width of the operand register can be fully utilized on the basis that the bit width of the operand register is the same as that of the round calculation result, a plurality of message words with the same bit width as that of the round calculation result are generated in one beat by means of single message expansion instruction, the message expansion speed of the SM3 algorithm is improved, and the processing speed of the SM3 algorithm is further improved.
In one implementation example, given that the source operands for message expansion are ymm1 and ymm2, respectively, and the destination operand is ymm0, the bit width of ymm1, ymm2, and ymm0 may be 256 bits based on the round calculation result, and ymm1 and ymm2 are set in source operand registers of 256 bits wide, and ymm0 is set in destination operand registers of 256 bits wide. Based on the message expansion instruction VSM3MSG256 provided in the embodiment of the present application, the embodiment of the present application can complete expansion of 8 message words in 1 beat on a 256-bit vector granularity. For example, based on an initially divided message word W 0 To W 15 Ymm1 stores the message word W 8 To W 15 Ymm2 stores the message word W 0 To W 7 The method comprises the steps of carrying out a first treatment on the surface of the The message expansion instruction VSM3MSG256 provided by the embodiment of the application can expand and generate the message word W in one beat through the corresponding formula 16 To W 23 And stored in ymm0; the extension of the subsequent message word is realized in the same way.
In further embodiments, a round calculation is provided based on embodiments of the present applicationFIG. 5 is a diagram illustrating an alternative hardware example of performing a round calculation according to an embodiment of the present application, where the hardware configuration shown in FIG. 5 may perform a round calculation within 2 beats. As shown in FIG. 5, in the jth round of computation, embodiments of the present application may be based on 8 state words (A i ,B i ,C i ,D i ,E i ,F i ,G i ,H i ) And a message word W j And message parameter W j ' 8 state words (A) i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 ) The specific process is as follows:
in the first stage of running water FX1, A i 、C i 、E i And G i Based on the left assignment operator (≡), a beat is latched in the second stage flowing water FX2 to obtain B in the second stage flowing water i+1 、D i+1 、F i+1 And H i+1 The method comprises the steps of carrying out a first treatment on the surface of the At the same time B i Execution of<<<9, generating C in the second stage of flowing water i+1 ;F i Execution of<<<19, generating G in the second stage of running water i+1
In the first stage of the running water FX1, tj is executed based on the current number of rounds j<<<j, the result of the operation is input to the first 32-bit CSA unit (CSA 32_1 shown in FIG. 5), and CSA32_1 is input to A at the same time i <<<12 and E i The method comprises the steps of carrying out a first treatment on the surface of the Csa32_1 may process the input data and compress it into 2-term outputs, with the 2-term output of csa32_1 going to the first 32-bit Adder (ade32_1 as shown in fig. 5); addition result execution by Adder32_1<<<7, obtaining an intermediate variable SS1; note that CSA (Carry Save Adder) is a carry save adder, which may be 32 bits wide, and if the inputs of the CSA unit are a, b, and c and the outputs are sum and car, it performs the calculation: sum=a # -b # -c, car=a # -c,&b|a&c|b&c;
further, in the first stage of running water FX1, SS1 and A i <<<12 performing exclusive or (XOR) on the operation result to obtain an intermediate variable SS2; intermediate variable SS1 SS2 may be stored in a register of second stage pipeline FX2 waiting for the next beat to execute;
at the same time, in the first stage of running water FX1, j, E i 、F i And G i Execution GG j Calculation, GG j Calculation result, H i And W is j Inputting a second 32-bit CSA unit (CSA32_2 as shown in FIG. 5); the 2-item output of CSA32_2 is stored in FX2 register to wait for the next beat to be executed; and j, A i 、B i And C i Executing FF j Calculation, FF j Calculation result, D i And W is j ' input the third 32-bit CSA unit (CSA32_3 as shown in FIG. 5), the 2-item output of CSA32_3 is stored in FX2 register, waiting for the next beat to execute;
on the next beat, the 2-item output of SS1, CSA32_2 in FX2 register goes into the fourth 32-bit CSA unit (CSA32_4 as shown in FIG. 5); the 2-term output of CSA32_4 is input to a second 32-bit Adder (Adder 32_2 as shown in FIG. 5) to generate TT2, TT2 goes through P 0 Hardware computation of a function, yielding E i+1
The 2-term output of SS2, csa32_3 in the FX2 register goes into the fifth 32-bit CSA unit (csa32_5 as shown in fig. 5), the 2-term output of csa32_5 goes into the third 32-bit Adder (ade32_3 as shown in fig. 5), thereby generating TT1; TT1 is calculated to obtain A based on left assignment operator (≡) i+1
The process of the round calculation shown in fig. 5 can complete one round calculation of one round in two beats by a single round calculation instruction; that is, in two consecutive beats, the state word used in the first-stage pipeline input wheel calculation is generated by one beat, and the calculation result of the one-round calculation is generated by the other beat in the second-stage pipeline, and part of the pipeline relation is exemplified in the following table 1:
TABLE 1
In the process of round calculation shown in fig. 5, the calculation result of each round of round calculation is 8 state words, and the result of the calculation of the previous round is used as the input of the calculation of the next round, so that the round calculation is performed in an iterative manner, and a hash value of the round calculation is obtained in the final round; if a round calculation of 64 iterations is performed, the round calculation unit needs to be completed by 128 beats. It can be seen that during the multi-round calculation of the SM3 algorithm, there is a data dependency in the front and rear round calculation, which results in that the round calculation can only be performed iteratively in series, and thus the throughput of the round calculation needs to be improved.
Based on this, in still further embodiments, the embodiments of the present application provide a round calculation implementation manner supporting internal bypass and provide a corresponding microstructure design, so that, in a round calculation process, a status word obtained by the second-stage flowing water FX2 is returned to the first-stage flowing water FX1 for selecting a status word used by a subsequent round calculation, so as to eliminate data correlation between round calculations of front and rear wheels, so that the SM3 algorithm continuous round calculation can be performed in a pipelining manner; and the effect of completing one-round calculation through one beat after the first-round calculation is achieved under the cost of adding a little time delay and hardware, and the throughput rate of the round calculation is improved.
Based on the foregoing idea, as an alternative implementation, fig. 6A illustrates a flowchart of yet another processing method of the SM3 algorithm provided in the embodiment of the present application, where the method flowchart may be implemented by being executed by a processor, for example, by being executed by a round computing unit in the processor, and referring to fig. 6A, the method flowchart may include the following steps.
In step S610, the status word of the next status generated by the current round calculation is returned from the second-stage pipeline to the first-stage pipeline.
In step S611, if the next round calculation of the current round is performed, in the first-stage pipelining, a status word of the returned next state is selected as the status word used for the next round calculation; if the current wheel calculation is performed, in the first stage of the pipeline, a state word of the previous state is selected and used as the state word used for the current wheel calculation.
In step S612, according to the single round calculation instruction, calculating at the current round, and obtaining a status word generated by the current round calculation based on the selected status word, the corresponding message word and the message parameter; and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
As an alternative implementation, in the first beat of the first round of calculation, the embodiment of the present application may select the state word of the initial state as the state word used in the first round of calculation, so as to calculate the state word of the next state. After the state word of the next state is calculated in the second stage pipeline, the state word of the next state can be returned to the first stage pipeline through a bypass mechanism so as to select the state word used by the subsequent round of calculation in combination with the state word of the last state existing in the first stage pipeline. In a possible implementation, it is assumed that the current round calculation is the j-th round calculation, and the state word of the last state is a i To H i The next state word of the next state generated by calculation of the jth round is A i+1 To H i+1 Then when the calculation of the j-th round is performed, the calculation is performed due to A i+1 To H i+1 Not yet generated, the embodiment of the application can select A i To H i Calculating a used status word as a jth round of calculation; the status word A generated by calculation of the jth round i+1 To H i+1 By bypassing the return to the first stage flow, based on the bypass enabling, embodiments of the present application may select A from the first stage flow at round j+1 i+1 To H i+1 Calculating the used status word as a j+1st round; further, the j+1st round calculates the generated state word A i+2 To H i+2 Can be returned to the first-stage running water through a bypass and is connected with the state word A of the last state in the first-stage running water i+1 To H i+1 The selection of the status word is performed together and so on. Therefore, the embodiment of the application returns the state word generated by each round of calculation to the first-stage running water through the bypass, so that the process of inputting the state word again in each round of calculation is omitted, and the number of beats occupied by inputting the state word can be greatly saved in the round of calculation.
As an alternative implementation, in the first beat of the first round of calculation, the embodiment of the application inputs an initial state word in the first-stage running water, and the state word used in the subsequent round of calculation is implemented by returning the state word generated by the second-stage running water to the first-stage running water; thus, in addition to the first beat calculated at the first round of running, in the first-stage running water input initial state words, in the subsequent two consecutive beats, the embodiment of the application can obtain the state words calculated at the first round of running water at the second round of running water through one beat and return to the first round of running water, and based on the state words returned to the first round of running water, the embodiment of the application can obtain the state words calculated at the next round of running water through the other beat at the second round of running water; that is, the status word calculated by one round of calculation can be generated by one beat of flowing water at the second level and returned to the first level, and based on the status word returned to the first level of flowing water, the embodiment of the application can directly generate the status word calculated by the next round of calculation at the next beat of flowing water at the second level of flowing water, so that the process that the next beat still needs to input the status word used by the next round of calculation is avoided. For ease of understanding, reference may be made to the following table 2 illustrating a partial example of a pipeline relationship in accordance with an embodiment of the present application.
TABLE 2
Under the bypass mechanism of the embodiment of the application, the state words used by the calculation of other wheels except the first wheel calculation are returned to the first-stage running water for realizing the state words generated by the calculation of the previous wheel, and the calculation result of the calculation of the next wheel is directly generated in the second-stage running water of the next beat, so that the calculation of the other wheels except the first wheel calculation needs to be completed by using 2 beats, the calculation of the other wheels can be completed in one beat based on the bypass mechanism, thereby carrying out the calculation of 64 wheels, and the calculation unit of the wheel can be completed by 65 (1×2+63 beats). It can be seen that, compared to the round calculation process illustrated in fig. 5, in the embodiment of the present application, by the mechanism that the calculation result of one round of calculation is returned from the second stage pipeline to the first stage pipeline by the bypass, the processing speed of the round calculation can be greatly improved, so that the processing speed of the SM3 algorithm is further greatly improved.
Under the bypass mechanism, the embodiment of the application can further improve the calculation logic of the round calculation process, so that the round calculation method is suitable for the round calculation mode under the bypass mechanism. In the case of a state word divided into a first partial state word (e.g., B, C, D, F, G, H) and a second partial state word (e.g., a, E), embodiments of the present application calculate the resulting next state's state word (e.g., a i+1 To H i+1 ) After returning from the second stage pipeline to the first stage pipeline, the state word A of the next state i+1 To H i+1 Status word A which can be associated with the previous status i To H i Selection is made.
Taking the current jth round of calculation as an example, the embodiment of the application can select the state word A of the last state in the first-stage pipelining i To H i Performing calculation processing; as an alternative implementation, embodiments of the present application may assign an operator (≡) to the selected A based on left i 、C i 、E i And G i Respectively processing to obtain B in the second stage of flowing water i+1 、D i+1 、F i+1 And H i+1 The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the first stage of running water, the water is discharged to the point B i Performing a cyclic left shift operation (e.g., performing B i <<<9) to obtain C in the second stage of the pipeline i+1 The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the first stage of running water, F i Performing cyclic left shift operations (e.g. performing F i <<<19) to obtain G in the second stage of the pipeline i+1
In some further embodiments, the embodiments of the present application may return intermediate state words As and Ac obtained in the second stage pipeline to the first stage pipeline through a bypass; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a; thus, the result of the cyclic left shift operation of As and Ac can be matched with the state word A i Selecting the cyclic left shift operation result of (2) and 0; as an alternative implementation, if the next round of calculation of the jth round is performed, in the first-stage pipelining, selecting a cyclic left shift operation result of the intermediate state words As and Ac for the next round of calculation; if the current wheel calculation is performed, in the first stage of the pipeline, Select status word A i The cyclic left shift operation result sum 0 of (2) is used for the calculation of the current wheel;
a second partial status word (A) i+1 ,E i+1 ) According to the calculation of the first stage running water, A can be selected according to the embodiment of the application i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' in the second stage of the pipeline, a state word A is generated i+1 ,E i+1 . As an alternative implementation, fig. 6B illustrates an alternative method flowchart for calculating a status word by the SM3 algorithm according to an embodiment of the present application, and as shown in fig. 6B, the method flowchart may include the following steps.
In step S621, in the first stage of pipelining, according to the selected state word A i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' intermediate data for calculating intermediate variables TT1 and TT2 are determined.
As an alternative implementation, for the j-th round of calculation, the embodiment of the application can stream the selected state word A in the first stage i 、B i And C i Carry out FF j Calculating; according to FF j Is the result of the operation of (1), the selected state word D i Message parameter W j ' first CSA operation is carried out, first CSA operation result and selected state word A i The cyclic left shift operation result of (2) is stored in a register of the second stage of pipeline;
at the same time, in the first stage, according to the selected A i Cyclic left shift operation result of (2), 0, and T j Performing a second CSA operation on the cyclic left shift operation result, performing a first addition operation on the second CSA operation result, and performing a cyclic left shift operation on the first addition operation result to obtain an intermediate variable S1;
at the same time, in the first stage of the pipeline, the selected state word E i Performing cyclic left shift operation to obtain an intermediate variable S2; for selected status word G i 、F i And E is i GG is performed j Calculation according to GG j Operation result, selected status word H i Message word W j Performing a third CSA operation; thus, the third CSA operation result, the intermediate variables S1 and S2 are subjected to a fourth CSA operation, and the fourth CSA operation result is stored in the register of the second stage pipeline; in the first stage of running water, the intermediate variables S1 and S2 are subjected to a second addition operation to obtain an intermediate variable SS1; the intermediate variable SS1 is stored in a register of the second stage pipeline.
In step S622, at the second stage of the pipeline, an intermediate variable TT1 is calculated based on the intermediate data, and a status word A is determined based on the intermediate variable TT1 i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the second stage of the pipeline, calculating an intermediate variable TT2 from the intermediate data, and determining a status word E from the intermediate variable TT2 i+1
As an optional implementation, in the second stage of pipeline, the embodiment of the application may perform a third addition operation on the fourth CSA operation result to obtain an intermediate variable TT2; thereby passing the intermediate variable TT2 through P 0 Function, get state word E i+1
At the same time, the embodiment of the application can realize the second-stage pipelining of the intermediate variable SS1 and the state word A i Performing exclusive or operation on the cyclic left shift operation result to obtain an intermediate variable SS2; and carrying out fifth CSA operation according to the intermediate variable SS2 and the first CSA operation result to obtain intermediate state words As and Ac. On the one hand, the intermediate state words As and Ac are returned to the first stage pipeline; on the other hand, a fourth addition operation is performed on the intermediate state words As and Ac to obtain an intermediate variable TT1, and the intermediate variable TT1 is processed based on a left assignment operator to obtain a state word A i+1
Based on the method flow principle of the bypass mechanism provided by the embodiment of the application, the embodiment of the application further provides an alternative hardware example for performing round calculation. On the basis of the processor implementation illustrated in fig. 2, fig. 7 illustrates an alternative exemplary diagram of a wheel calculation unit provided in an embodiment of the present application, as illustrated in fig. 7, where the wheel calculation unit may include: bypass unit 710, multi-bank selector 720, and computation logic 730;
the bypass unit 710 is configured to return, from the second-stage pipeline to the first-stage pipeline, a status word of a next status generated by calculation of the current wheel;
A plurality of groups of selectors 720 for selecting the returned status word of the next status in the first-stage pipeline as the status word used for the next round of calculation if the next round of calculation is performed; if the current wheel calculation is carried out, in the first-stage pipelining, selecting a state word of the previous state as the state word used for the current wheel calculation;
calculation logic 730, configured to calculate, at a current round according to the single round calculation instruction, a status word generated by the current round calculation based on the status word, the corresponding message word, and the message parameter selected by the multiple groups of selectors; and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
In further embodiments, the bypass unit 710 may be further configured to return intermediate state words As and Ac obtained in the second stage pipeline to the first stage pipeline; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a;
the multiple sets of selectors 720 are further configured to select, if the calculation of the next round of the jth round is performed, the cyclic left shift operation result of the intermediate state words As and Ac in the first-stage pipeline for the calculation of the next round; if the current round calculation is performed, in the first-stage pipelining, selecting a state word A i And 0 for the current round of calculation.
As an alternative implementation, fig. 8 illustrates another alternative hardware example diagram for performing round computation provided in the embodiment of the present application, where the hardware structure shown in fig. 8 may complete one round of computation in 1 beat, in addition to the first round of computation. As shown in fig. 8, taking the current wheel calculation as the jth wheel calculation as an example, the jth wheel calculation generates a state word a of the next state i+1 To H i+1 Correspondingly, the last shapeThe state word of the state is A i To H i The method comprises the steps of carrying out a first treatment on the surface of the And the plurality of sets of selectors in the wheel calculation unit may include: a first set of selectors 810, a second set of selectors 820, a third set of selectors 830;
the first set of selectors 810 may include a plurality of selectors (Mux) in the state word H i+1 、G i+1 、F i+1 And E is i+1 After returning to the first stage of the pipeline through the bypass unit, the plurality of selectors can select the returned state word H in the first stage of the pipeline when the next round of calculation of the j-th round is performed i+1 、G i+1 、F i+1 And E is i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word H is selected when the calculation of the front wheel is performed i 、G i 、F i And E is i As a state word used for calculating the current wheel;
the second set of selectors 820 may include a plurality of selectors, in state word A i+1 、B i+1 、C i+1 And D i+1 After returning to the first stage of the pipeline through the bypass unit, the plurality of selectors can select the returned state word A in the first stage of the pipeline when the next round of calculation of the j-th round is performed i+1 、B i+1 、C i+1 And D i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out i 、B i 、C i And D i As a state word used for calculating the current wheel;
the third set of selectors 830 may include a plurality of selectors for deriving the state word A i+1 After the intermediate state words As and Ac of (a) are bypassed back to the first stage pipeline, the multiple selectors can select the cyclic left shift operation result of the returned As and Ac (e.g. As shown in FIG. 8) in the first stage pipeline when the next round of calculation of the j-th round is performed<<<12、Ac<<<12 For the next round of calculation; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out i The cyclic left shift operation result (e.g., A shown in FIG. 8 i <<<12 And 0) for the current wheel calculation.
In the implementation process of the calculation logic 730 obtaining the state word generated by each round of calculation based on the selected state word, the process of calculating the first part of the state word by the calculation logic 730 may be the same as the manner described in the corresponding part; for example, in the j-th round of calculation, the state word A selected by the first-stage pipelining is calculated i 、C i 、E i And G i Processing is performed based on the left assignment operator (≡), respectively, so as to generate a status word B in the second-stage pipeline i+1 、D i+1 、F i+1 And H i+1 The method comprises the steps of carrying out a first treatment on the surface of the At the same time, in the first stage of pipelining, for the selected state word B i Performing a cyclic left shift operation to generate a state word C in the second stage pipeline i+1 The method comprises the steps of carrying out a first treatment on the surface of the At the same time, in the first stage of pipelining, for the selected state word F i Performing a cyclic left shift operation to generate a state word G in the second stage pipeline i+1
Aiming at the calculation of the state words A and E, in order to reduce the key path time delay, the embodiment of the application can adjust the calculation logic of the two-stage pipelining; for example, the computational logic may be pipelined at a first level, based on the selected state word A i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' determining intermediate data for calculating intermediate variables TT1 and TT 2; in the second stage of the pipeline, based on the intermediate data, an intermediate variable TT1 is calculated and a state word A is determined based on the intermediate variable TT1 i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the second stage of the pipeline, calculating an intermediate variable TT2 from the intermediate data, and determining a status word E from the intermediate variable TT2 i+1
In a more specific alternative implementation of computing the state words A and E, as shown in connection with FIG. 8, using the jth round of computation as an example, in a first stage of pipelining, the computation logic may select the state word A selected by the second set of selectors 820 i 、B i And C i Carry out FF j Calculating; FF (FF) j The result of the operation of (a), the state word D selected by the second group selector 820 i Message parameter W j ' first 3 in 2 out CSAA unit (e.g., csa32_1 of fig. 8) performing a first CSA operation; at the same time, the state word A selected for the second group selector 820 i A cyclic left shift operation is performed (e.g. the cyclic left shift operation shown in figure 8 with a value of 12,<<<12 A) is provided; first CSA operation result of CSA32_1 and state word A i The cyclic left shift operation result of (2) is stored in a register of the second stage of pipeline;
meanwhile, in the first stage of pipelining, the computation logic may select A from the third set of selectors 830 i Cyclic left shift operation result of (2), 0, and T j The result of the cyclic left shift operation of (e.g., T shown in FIG. 8) j <<<j) Feeding into a second 3-input 2-output CSA unit (e.g., CSA32_2 shown in FIG. 8) to perform a second CSA operation; the second CSA operation result of csa32_2 is input to a 32-bit first adding unit (for example, adder32_1 shown in fig. 8) to perform a first adding operation; further, the first addition result of Adder 32-1 performs a cyclic left shift operation (e.g., a cyclic left shift operation having a value of 7 as shown in FIG. 8,<<<7) Thereby obtaining an intermediate variable S1;
At the same time, in the first stage of pipelining, the computation logic may select the state word E selected by the first set of selectors 810 i A cyclic left shift operation is performed (e.g. the cyclic left shift operation shown in figure 8 with a value of 7,<<<7) Obtaining an intermediate variable S2; also, the computation logic may select the status word G selected by the first set of selectors 810 i 、F i And E is i GG is performed j Calculating; GG j The result of the operation of (a), the status word H selected by the first group selector 810 i Message word W j Sending to a third 3-input 2-output CSA unit (e.g., CSA32_3 shown in FIG. 8) for performing a third CSA operation;
in the first stage of pipeline, the calculation logic may send the third CSA calculation result of the intermediate variables S1 and S2 and csa32_3 to the first 4-input-2-output CSA unit (e.g., csa42_1 shown in fig. 8) to perform the fourth CSA calculation; the fourth CSA operation result of CSA42_1 is stored in a register of the second stage pipeline;
meanwhile, in the first stage of pipeline, the calculation logic may send the intermediate variables S1 and S2 to a second addition unit (for example, adder32_2 shown in fig. 8) with 32 bits to obtain an intermediate variable SS1; the intermediate variable SS1 is stored in a register of the second stage pipeline.
In the second stage of pipeline, the calculation logic may send the fourth CSA operation result of csa42_1 to a third addition unit (for example, adder32_3 shown in fig. 8) with 32 bits, and perform a third addition operation to obtain an intermediate variable TT2; passing the intermediate variable TT2 through P 0 Function, get state word E i+1 The method comprises the steps of carrying out a first treatment on the surface of the At this time E i+1 The first-stage running water is returned through a bypass;
in the second stage of the pipeline, the computation logic may store intermediate variables SS1, state word A i Performing an exclusive or (XOR) operation on the cyclic left shift operation result of (2) to obtain an intermediate variable SS2; the intermediate variables SS2 and the first CSA operation result of csa32_1 are sent to a fourth CSA unit (e.g., csa32_4 shown in fig. 8) with 3 input and 2 output, and a fifth CSA operation is performed to obtain intermediate status words As and Ac; the intermediate state words As and Ac are returned to the first stage of the pipeline on the one hand, and on the other hand, the intermediate state words As and Ac are sent to a fourth adding unit (for example, adder32_4 shown in fig. 8) with 32 bits, and a fourth adding operation is performed to obtain an intermediate variable TT1; thus, based on the left assignment operator, the intermediate variable TT1 is processed to obtain the state word A i+1 ;A i+1 And returns to the first stage of running water through the bypass.
The above shows that the calculation logic 730 selects the state word A when performing the jth round of calculation i To H i Example procedure for performing round computation upon performing the next round computation for the jth round, embodiments of the present application may select the state word a accordingly based on the bypass mechanism i+1 To H i+1 The round calculation is performed, and the implementation process of the round calculation can be similarly referred to the description of the corresponding parts, except that the adaptation of the message word and the message parameter is adjusted.
It can be seen that for the computation of state words A and E, the present embodiment of the application can be used to obtain an intermediate state word A of state word A S And A C Returning to the first stage pipeline, then A S And A C Each cycle is shifted left by 12 bits and then neutralized in the third set of selectors 830 (a i 0) selecting; if go intoThe next round of round computation is then bypass enabled and the third set of selectors 830 selects (a S ,A C ) For round calculation, otherwise select (A i 0) for round calculation; selection result and Tj of the third group selector 830<<<j, compressing through CSA32_2, and then sending the compression result of CSA32_2 into Adder32_1; the result of Adder32_1 is recycled to the left by 7 bits, obtaining an intermediate variable S1; on the other hand, the first group selector 810 is in pair E i+1 And E is i After the second selection, the selection result is circularly shifted to the left by 7 bits to obtain an intermediate variable S2;
further, S1, S2 and two output results from CSA32_3 are sent to CSA42_1 to generate two compression results; after two compression results generated by CSA42_1 are stored in FX2 register, they pass through Adder32_3 and P 0 A function, obtaining a calculation result of the state word E in one round;
meanwhile, S1 and S2 generate an intermediate variable SS1 through Adder32_2, and the intermediate variable SS1 is stored in the FX2 register; then in the second stage of the pipeline, the intermediate variables SS1 and A i <<<12, obtaining an intermediate variable SS2 by a 32-bit exclusive-or operation (XOR); the operation result of CSA32_1 stored in the intermediate variable SS2 and FX2 registers is compressed by CSA32_4 to generate A S And A C ;A S And A C And bypassing the first stage of flowing water, and simultaneously inputting Adder32_4 to obtain the calculation result of the state word A in one round.
According to the processing scheme of the SM3 algorithm, which is provided by the embodiment of the application, the round calculation can be performed on the vector granularity with the same bit width as the round calculation result (for example, the round calculation is performed on the vector granularity of 256 bits), the bit width of an operand register of a vector component can be fully utilized, and the processing speed of SM3 is improved. Further, for the round calculation of the SM3 algorithm, because data dependence exists between the round calculation of the front and rear wheels of the SM3 algorithm, the round calculation can only be executed in series, and the throughput rate of calculation is not high; therefore, the embodiment of the application further provides a microstructure design supporting an internal bypass, the state word and the intermediate state word generated by one round of calculation in the second-stage running water are returned to the first-stage running water through the bypass, so that the data correlation between front and rear round of calculation is eliminated, the continuous round of calculation of the SM3 algorithm can be carried out in a pipelining manner, and the throughput rate of completing one round of calculation in one beat can be achieved in the round of calculation of other rounds of calculation, and the processing speed of the SM3 algorithm is further improved, and the calculation performance of the SM3 algorithm is improved.
Further, the embodiment of the application also provides a chip, and the chip can comprise the processor provided by the embodiment of the application.
Further, the embodiment of the application also provides an electronic device, such as a terminal device or a server device, which may include the processor provided in the embodiment of the application.
The foregoing describes a number of embodiments provided by embodiments of the present application, and the various alternatives presented by the various embodiments may be combined, cross-referenced, with each other without conflict, extending beyond what is possible, all of which may be considered embodiments disclosed and disclosed by embodiments of the present application.
Although the embodiments of the present application are disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention shall be defined by the appended claims.

Claims (25)

1. A method for processing an SM3 algorithm, comprising:
reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand includes a plurality of message words and message parameters, and the bit width of the first source operand is the same as the bit width of the round calculation result;
According to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state;
storing the status word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of a round calculation result;
returning the state word of the next state generated by the calculation of the current wheel from the second-stage flowing water to the first-stage flowing water through a bypass mechanism;
if the next round of calculation of the current round is carried out, in the first-stage flowing water, selecting a returned state word of the next state as the state word used for the next round of calculation; if the current wheel calculation is performed, in the first stage of the pipeline, a state word of the previous state is selected and used as the state word used for the current wheel calculation.
2. The method of claim 1, wherein the current round of computation is a j-th round of computation, wherein the first source operand comprises a plurality of message words and message parameters comprising:
the first source operand includes a message word W j ,W j+1 ,W j+2 ,W j+3 Message parameter W j ’,W j+1 ’,W j+2 ’,W j+3 ' wherein j represents the current number of wheels calculated;
The reading the message word and the message parameter corresponding to the current round of calculation from the first source operand of the first source operand register includes:
reading the corresponding message word W of the jth round of calculation from the first source operand j And message parameter W j ' where j represents the current number of wheels calculated.
3. The method of claim 2, wherein the status word comprises a, B, C, D, E, F, G, H; the status word of the next status calculated by the jth round of calculation comprises: a is that i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 The method comprises the steps of carrying out a first treatment on the surface of the The j-th round of calculating the required state word of the last state comprises: a is that i ,B i ,C i ,D i ,E i ,F i ,G i ,H i
4. A method according to claim 3, wherein said calculating the status word of the next state using the status word of the previous state and the current round to calculate the corresponding message word and message parameter according to a single round calculation instruction comprises:
according to the single round calculation instruction, the state word A of the last state is utilized i ,B i ,C i ,D i ,E i ,F i ,G i ,H i Message word W j And message parameter W j ' calculate the status word A of the next status i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1
The depositing the status word of the next state in the first destination operand of the first destination operand register includes:
state word a of the next state i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 Stored in a first destination operand of a first destination operand register.
5. The method as recited in claim 1, further comprising:
reading a message word for message expansion from a second source operand of the second source operand register and a third source operand of the third source operand register; the bit widths of the second source operand and the third source operand are the same as the bit width of the round calculation result, the second source operand comprises a first number of message words with the largest obtained sequence number, and the third source operand comprises a first number of message words before the second source operand;
according to a single message expansion instruction, expanding a first number of message words after generating a second source operand by using the read message words;
storing the message word generated by the expansion in a second destination operand of a second destination operand register; wherein the bit width of the second destination operand is the same as the bit width of the round calculation result.
6. The method of claim 5, wherein the second source operand comprises a first number of message words having a highest sequence number obtained comprises: the second source operand comprises 8 message words with the largest obtained sequence numbers;
The third source operand comprising a first number of message words preceding the second source operand comprising: the third source operand includes 8 message words preceding the second source operand;
the expanding the first number of message words after generating the second source operand with the read message words according to a single message expansion instruction includes:
according to the message expansion instruction, 8 message words after the second source operand is generated are expanded by using the read message words.
7. The method as recited in claim 4, further comprising:
the intermediate state words As and Ac obtained in the second-stage flowing water are returned to the first-stage flowing water through a bypass; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a;
if the next round of calculation of the j-th round is carried out, in the first-stage flowing water, selecting a cyclic left shift operation result of the intermediate state words As and Ac for the next round of calculation; if the current round calculation is performed, in the first-stage pipelining, selecting a state word A i And 0 for the current round of calculation.
8. The method of claim 7, wherein the single round calculation instruction uses the state word a of the last state i ,B i ,C i ,D i ,E i ,F i ,G i ,H i Message word W j And message parameter W j ' calculate the status word A of the next status i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 Comprising the following steps:
according to the single round calculation instruction, calculating at the current round, and obtaining a state word generated by the calculation of the current round based on the selected state word, the corresponding message word and the message parameter;
the method further comprises the steps of:
and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
9. The method of claim 8, wherein the obtaining the status word generated by the current round calculation based on the selected status word, the corresponding message word, and the message parameter at the current round calculation based on the single round calculation instruction comprises:
for the j-th round of calculation, in the first stage of pipelining, based on a left assignment operator, the selected A i 、C i 、E i And G i Respectively processing to obtain B in the second stage of flowing water i+1 、D i+1 、F i+1 And H i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected B i Performing a cyclic left shift operation to obtain C in the second stage pipeline i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected F i Performing a cyclic left shift operation to obtain G in the second stage of pipeline i+1
And A is selected according to the first stage of flowing water i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' in the second stage of the pipeline, a state word A is generated i+1 ,E i+1
10. The method according to claim 9, characterized in thatA is selected according to the first stage flowing water i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' in the second stage of the pipeline, a state word A is generated i+1 ,E i+1 Comprising the following steps:
in the first stage, according to the selected state word A i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' determining intermediate data for calculating intermediate variables TT1 and TT 2;
in the second stage of the pipeline, based on the intermediate data, an intermediate variable TT1 is calculated and a state word A is determined based on the intermediate variable TT1 i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the second stage of the pipeline, calculating an intermediate variable TT2 from the intermediate data, and determining a status word E from the intermediate variable TT2 i+1
11. The method of claim 10, wherein the pipelining is performed at a first stage based on a selected state word a i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' determining intermediate data for calculating intermediate variables TT1 and TT2 includes:
in the first stage of pipelining, the selected state word A i 、B i And C i Carry out FF j Calculating; according to FF j Operation result, selected status word D i Message parameter W j ' performing a first CSA operation, wherein CSA is a carry save adder;
and according to the selected A i Cyclic left shift operation result of (2), 0, and T j Performing a second CSA operation on the cyclic left shift operation result, performing a first addition operation on the second CSA operation result, and performing a cyclic left shift operation on the first addition operation result to obtain an intermediate variable S1;
and, willSelected status word E i Performing cyclic left shift operation to obtain an intermediate variable S2; for selected status word G i 、F i And E is i GG is performed j Calculation according to GG j Operation result, selected status word H i Message word W j Performing a third CSA operation;
and performing a fourth CSA operation on the third CSA operation result, the intermediate variables S1 and S2 to obtain a fourth CSA operation result; and performing a second addition operation on the intermediate variables S1 and S2 to obtain an intermediate variable SS1.
12. The method of claim 11, wherein at the second stage the pipeline calculates an intermediate variable TT1 from the intermediate data and determines the state word a from the intermediate variable TT1 i+1 Comprising the following steps:
intermediate variable SS1, status word A i Performing exclusive or operation on the cyclic left shift operation result to obtain an intermediate variable SS2; according to the intermediate variable SS2 and the first CSA operation result, performing fifth CSA operation to obtain intermediate state words As and Ac; performing fourth addition operation on the intermediate state words As and Ac to obtain an intermediate variable TT1; based on the left assignment operator, the intermediate variable TT1 is processed to obtain a state word A i+1
The second stage of the pipeline calculates an intermediate variable TT2 according to the intermediate data and determines a status word E according to the intermediate variable TT2 i+1 Comprising the following steps:
in the second-stage running water, performing third addition operation on the fourth CSA operation result to obtain an intermediate variable TT2; passing the intermediate variable TT2 through P 0 Function, get state word E i+1
13. A processor, comprising: a round calculation unit and an operand register; the operand register includes: a first source operand register and a first destination operand register; wherein the first source operand register sets a first source operand, the first source operand including a plurality of message words and message parameters, and the first source operand having a bit width that is the same as a bit width of a round calculation result; the first destination operand register is used for setting a first destination operand, and the bit width of the first destination operand is the same as that of a round calculation result;
the processor is configured with a single wheel calculation instruction; the round calculation unit is used for reading a message word and a message parameter corresponding to the current round calculation from a first source operand of the first source operand register; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; storing a status word of the next state in a first destination operand of a first destination operand register; returning the state word of the next state generated by the calculation of the current wheel from the second-stage flowing water to the first-stage flowing water through a bypass mechanism; if the next round of calculation of the current round is carried out, in the first-stage flowing water, selecting a returned state word of the next state as the state word used for the next round of calculation; if the current wheel calculation is performed, in the first stage of the pipeline, a state word of the previous state is selected and used as the state word used for the current wheel calculation.
14. The processor of claim 13, wherein the current round of computation is a j-th round of computation, wherein the first source operand comprises a plurality of message words and message parameters comprising:
the first source operand includes a message word W j ,W j+1 ,W j+2 ,W j+3 Message parameter W j ’,W j+1 ’,W j+2 ’,W j+3 ' wherein j represents the current number of wheels calculated;
the round calculation unit is configured to read, from a first source operand in a first source operand register, a message word and a message parameter corresponding to a current round calculation, where the reading includes:
reading the corresponding message word W of the jth round of calculation from the first source operand j And message parameter W j ' where j represents the current number of wheels calculated.
15. The processor of claim 14, wherein the status word comprises a, B, C, D, E, F, G, H; the status word of the next status calculated by the jth round of calculation comprises: a is that i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1 The method comprises the steps of carrying out a first treatment on the surface of the The j-th round of calculating the required state word of the last state comprises: a is that i ,B i ,C i ,D i ,E i ,F i ,G i ,H i
The round calculation unit is configured to calculate, according to a single round calculation instruction, a corresponding message word and a message parameter by using a state word of a previous state and a current round, where calculating a state word of a next state includes:
According to the single round calculation instruction, the state word A of the last state is utilized i ,B i ,C i ,D i ,E i ,F i ,G i ,H i Message word W j And message parameter W j ' calculate the status word A of the next status i+1 ,B i+1 ,C i+1 ,D i+1 ,E i+1 ,F i+1 ,G i+1 ,H i+1
16. The processor of claim 13, wherein the processor further comprises: a message expansion unit; the operand register further includes: a second source operand register, a third source operand register, and a second destination operand register;
wherein the second source operand register sets a second source operand and the third source operand register sets a third source operand; the bit widths of the second source operand and the third source operand are the same as the bit width of the round calculation result, the second source operand comprises a first number of message words with the largest obtained sequence number, and the third source operand comprises a first number of message words before the second source operand;
the processor is configured with a single message expansion instruction; the message expansion unit is used for reading a message word used for message expansion from a second source operand of the second source operand register and a third source operand of the third source operand register; according to a single message expansion instruction, expanding a first number of message words after generating a second source operand by using the read message words; and storing the message word generated by the expansion in a second destination operand of the second destination operand register.
17. The processor of claim 15, wherein the wheel calculation unit comprises:
the bypass unit is used for returning the state word of the next state generated by the calculation of the current wheel from the second-stage running water to the first-stage running water;
the multi-group selector is used for selecting a returned state word of the next state in the first-stage flowing water as the state word used for the next round of calculation if the next round of calculation of the current round is performed; if the current wheel calculation is carried out, in the first-stage pipelining, selecting a state word of the previous state as the state word used for the current wheel calculation;
the calculation logic is used for calculating at the current round according to the single round calculation instruction, and obtaining a state word generated by the current round calculation based on the state word, the corresponding message word and the message parameter selected by the multiple groups of selectors; and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.
18. The processor of claim 17, wherein the bypass unit is further configured to return intermediate state words As and Ac obtained in the second stage pipeline to the first stage pipeline; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a;
The multiple groups of selectors are also used for calculating the next round of the j-th round, and then the first-level stream is formedIn water, selecting a cyclic left shift operation result of the intermediate state words As and Ac for the next round of calculation; if the current round calculation is performed, in the first-stage pipelining, selecting a state word A i And 0 for the current round of calculation.
19. The processor of claim 18, wherein the plurality of sets of selectors comprises:
a first group of selectors including multiple selectors for selecting the returned state word H in the first stage of the pipeline during the next round of calculation of the j-th round i+1 、G i+1 、F i+1 And E is i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word H is selected when the calculation of the front wheel is performed i 、G i 、F i And E is i As a state word used for calculating the current wheel;
a second group of selectors including multiple selectors for selecting the returned state word A in the first stage of the pipeline when the next round of calculation of the j-th round is performed i+1 、B i+1 、C i+1 And D i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out i 、B i 、C i And D i As a state word used for calculating the current wheel;
a third group of selectors, including a plurality of selectors, which select the cyclic left shift operation result of the returned intermediate state words As and Ac in the first stage of flowing water for the next round of calculation when the next round of calculation of the j-th round is performed; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out i And 0 for the current round of calculation.
20. The processor of claim 19, wherein the calculation logic to obtain the status word generated by the current round calculation based on the status word selected by the plurality of sets of selectors, the corresponding message word, and the message parameter at the current round calculation based on the single round calculation instruction comprises:
for the j-th round of calculation, in the first stage of pipelining, based on a left assignment operator, the selected A i 、C i 、E i And G i Respectively processing, and latching the processing result in the second stage pipeline to obtain B i+1 、D i+1 、F i+1 And H i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected B i Performing a cyclic left shift operation to obtain C in the second stage pipeline i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected F i Performing a cyclic left shift operation to obtain G in the second stage of pipeline i+1
And A is selected according to the first stage of flowing water i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' in the second stage of the pipeline, a state word A is generated i+1 ,E i+1
21. The processor of claim 20, wherein the computation logic is to select a based on a first stage pipeline i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' in the second stage of the pipeline, a state word A is generated i+1 ,E i+1 Comprising the following steps:
in the first stage, according to the selected state word A i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' determining intermediate data for calculating intermediate variables TT1 and TT 2; in the second stage of the pipeline, based on the intermediate data, an intermediate variable TT1 is calculated and a state word A is determined based on the intermediate variable TT1 i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in a second stage of the pipeline, calculating an intermediate from the intermediate dataVariable TT2, and determining the status word E from the intermediate variable TT2 i+1
22. The processor of claim 21, wherein the computation logic is to pipeline, at a first stage, based on a selected state word a i 、B i 、C i 、D i 、E i 、F i 、G i 、H i And a message word W j And message parameter W j ' determining intermediate data for calculating intermediate variables TT1 and TT2 includes:
in the first stage of the pipeline, the state word A selected by the second group of selectors is processed i 、B i And C i Carry out FF j Calculating; FF is put into j Operation result, state word D selected by the second group selector i Message parameter W j ' send into first 3 input 2 output CSA unit, carry on first CSA operation, CSA is carry save adder;
and A selected by the third group selector i Cyclic left shift operation result of (2), 0, and T j The cyclic left shift operation result of (2) is sent to a second 3-input 2-output CSA unit to carry out second CSA operation; the second CSA operation result is input into a 32-bit first addition unit to carry out first addition operation; performing cyclic left shift operation on the first addition operation result to obtain an intermediate variable S1;
and, selecting the state word E by the first group selector i Performing cyclic left shift operation to obtain an intermediate variable S2; the state word G selected by the first group of selectors i 、F i And E is i GG is performed j Calculating; will GG j Calculation result, message word W j A first group of selector selected status words H i Sending the third CSA unit with 3 input and 2 output to perform a third CSA operation; and, send the intermediate variable S1, S2 and third CSA operation result into the first 4 input 2 output CSA unit, carry on the fourth CSA operation;
and feeding the intermediate variables S1 and S2 into a 32-bit second addition unit for performing a second addition operation to obtain an intermediate variable SS1.
23. The processor of claim 22, wherein the computation logic is to pipeline at a second stage, compute an intermediate variable TT1 from the intermediate data, and determine a state word a from the intermediate variable TT1 i+1 Comprising the following steps:
in the second stage of the pipeline, the intermediate variable SS1 and the state word A selected by the second group of selectors are processed i Performing exclusive or operation on the cyclic left shift operation result to obtain an intermediate variable SS2; the intermediate variable SS2 and the first CSA operation result are sent to a fourth CSA unit with 3 input and 2 output to perform fifth CSA operation so As to obtain intermediate state words As and Ac; sending the intermediate state words As and Ac to a fourth adding unit with 32 bits, and performing fourth adding operation to obtain an intermediate variable TT1; based on the left assignment operator, the intermediate variable TT1 is processed to obtain a state word A i+1
The calculating logic is used for flowing water in the second stage, calculating an intermediate variable TT2 according to the intermediate data, and determining a status word E according to the intermediate variable TT2 i+1 Comprising the following steps:
sending the fourth CSA operation result into a 32-bit third addition unit for third addition operation to obtain an intermediate variable TT2; passing the intermediate variable TT2 through P 0 Function, get state word E i+1
24. A chip comprising a processor as claimed in any one of claims 13 to 23.
25. An electronic device comprising the chip of claim 24.
CN202210493013.1A 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment Active CN114978473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210493013.1A CN114978473B (en) 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210493013.1A CN114978473B (en) 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN114978473A CN114978473A (en) 2022-08-30
CN114978473B true CN114978473B (en) 2024-03-01

Family

ID=82981765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210493013.1A Active CN114978473B (en) 2022-05-07 2022-05-07 SM3 algorithm processing method, processor, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN114978473B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116318660B (en) * 2023-01-12 2023-12-08 成都海泰方圆科技有限公司 Message expansion and compression method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430737A (en) * 2008-11-19 2009-05-13 西安电子科技大学 Wavelet transformation-improved VLSI structure design method
CN106575215A (en) * 2014-09-04 2017-04-19 英特尔公司 Emulation of fused multiply-add operations
CN108427575A (en) * 2018-02-01 2018-08-21 深圳市安信智控科技有限公司 Fully pipelined architecture SHA-2 extension of message optimization methods
CN112367158A (en) * 2020-11-06 2021-02-12 海光信息技术股份有限公司 Method for accelerating SM3 algorithm, processor, chip and electronic equipment
CN113076277A (en) * 2021-03-26 2021-07-06 大唐微电子技术有限公司 Method and device for realizing pipeline scheduling, computer storage medium and terminal
CN113282947A (en) * 2021-07-21 2021-08-20 杭州安恒信息技术股份有限公司 Data encryption method and device based on SM4 algorithm and computer platform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9658854B2 (en) * 2014-09-26 2017-05-23 Intel Corporation Instructions and logic to provide SIMD SM3 cryptographic hashing functionality

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430737A (en) * 2008-11-19 2009-05-13 西安电子科技大学 Wavelet transformation-improved VLSI structure design method
CN106575215A (en) * 2014-09-04 2017-04-19 英特尔公司 Emulation of fused multiply-add operations
CN108427575A (en) * 2018-02-01 2018-08-21 深圳市安信智控科技有限公司 Fully pipelined architecture SHA-2 extension of message optimization methods
CN112367158A (en) * 2020-11-06 2021-02-12 海光信息技术股份有限公司 Method for accelerating SM3 algorithm, processor, chip and electronic equipment
CN113076277A (en) * 2021-03-26 2021-07-06 大唐微电子技术有限公司 Method and device for realizing pipeline scheduling, computer storage medium and terminal
CN113282947A (en) * 2021-07-21 2021-08-20 杭州安恒信息技术股份有限公司 Data encryption method and device based on SM4 algorithm and computer platform

Also Published As

Publication number Publication date
CN114978473A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
JP5328186B2 (en) Data processing system and data processing method
JP3703092B2 (en) Hardware for modular multiplication using multiple nearly identical processor elements
CN112367158B (en) Method for accelerating SM3 algorithm, processor, chip and electronic equipment
Wang et al. FPGA implementation of a large-number multiplier for fully homomorphic encryption
FR2867579A1 (en) Montgomery modular multiplier for computing system, selects one of n-bit modulus numbers as a modulus product and one of n-bit multiplicand numbers as partial product
CN115344237B (en) Data processing method combining Karatsuba and Montgomery modular multiplication
JP4612680B2 (en) Apparatus and method for performing MD5 digesting
CN111612622B (en) Circuit and method for performing a hashing algorithm
CN114978473B (en) SM3 algorithm processing method, processor, chip and electronic equipment
Seo et al. Optimized implementation of SIKE round 2 on 64-bit ARM Cortex-A processors
CN115525342A (en) Acceleration method of SM3 password hash algorithm and instruction set processor
Lee et al. An efficient DPA countermeasure with randomized montgomery operations for DF-ECC processor
US5983252A (en) Pseudo-random number generator capable of efficiently exploiting processors having instruction-level parallelism and the use thereof for encryption
Michail et al. A top-down design methodology for ultrahigh-performance hashing cores
Chaves et al. Secure hashing: Sha-1, sha-2, and sha-3
CN109933304B (en) Rapid Montgomery modular multiplier operation optimization method suitable for national secret sm2p256v1 algorithm
US7240204B1 (en) Scalable and unified multiplication methods and apparatus
Pornin Optimized binary gcd for modular inversion
JP2004519017A (en) Method and apparatus for multiplying coefficients
CN116318660B (en) Message expansion and compression method and related device
El-Razouk Input-latency free versatile bit-serial GF (2 m) polynomial basis multiplication
CN213518334U (en) Circuit for executing Hash algorithm, computing chip and encrypted currency mining machine
CN114553424A (en) ZUC-256 stream cipher light-weight hardware system
JP2002251137A (en) System and method for modula multiplication
CN213482935U (en) Circuit for executing Hash algorithm, computing chip and encrypted currency mining machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant