CN114978473B

CN114978473B - SM3 algorithm processing method, processor, chip and electronic equipment

Info

Publication number: CN114978473B
Application number: CN202210493013.1A
Authority: CN
Inventors: 姚涛
Original assignee: Haiguang Information Technology Co Ltd
Current assignee: Haiguang Information Technology Co Ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2024-03-01
Anticipated expiration: 2042-05-07
Also published as: CN114978473A

Abstract

The embodiment of the application provides a processing method, a processor, a chip and electronic equipment of an SM3 algorithm, wherein the method comprises the following steps: reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand includes a plurality of message words and message parameters, and the bit width of the first source operand is the same as the bit width of the round calculation result; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; and storing the status word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of a round calculation result. The embodiment of the application can improve the processing speed of the SM3 algorithm.

Description

SM3 algorithm processing method, processor, chip and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of cryptography, in particular to a processing method, a processor, a chip and electronic equipment of an SM3 algorithm.

Background

The SM3 algorithm is a cryptographic hash function standard that can be applied to lengths less than 2 ⁶⁴ A bit (bit) message produces a 256-bit hash value, which may be a message digest (bit string) that is output when a hash algorithm is applied to a message. The SM3 algorithm is essentially a password hash algorithm, and can be used for the scenes requiring password security, such as digital signature and verification, generation and verification of a message authentication code, random number generation and the like in commercial password application. Based on the wide application of the SM3 algorithm, how to improve the processing speed of the SM3 algorithm becomes a problem to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of this, the embodiments of the present application provide a processing method, a processor, a chip and an electronic device for an SM3 algorithm, so as to improve the processing speed of the SM3 algorithm.

In order to achieve the above purpose, the embodiments of the present application provide the following technical solutions.

In a first aspect, an embodiment of the present application provides a processing method of an SM3 algorithm, including:

reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand includes a plurality of message words and message parameters, and the bit width of the first source operand is the same as the bit width of the round calculation result;

According to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state;

and storing the status word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of a round calculation result.

In a second aspect, embodiments of the present application provide a processor, comprising: a round calculation unit and an operand register; the operand register includes: a first source operand register and a first destination operand register; wherein the first source operand register sets a first source operand, the first source operand including a plurality of message words and message parameters, and the first source operand having a bit width that is the same as a bit width of a round calculation result; the first destination operand register is used for setting a first destination operand, and the bit width of the first destination operand is the same as that of a round calculation result;

the processor is configured with a single wheel calculation instruction; the round calculation unit is used for reading a message word and a message parameter corresponding to the current round calculation from a first source operand of the first source operand register; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; the status word of the next state is stored in a first destination operand of a first destination operand register.

In a third aspect, embodiments of the present application provide a chip including a processor as described above.

In a fourth aspect, embodiments of the present application provide an electronic device including a chip as described above.

According to the embodiment of the application, on the basis of setting the first source operand register and the first destination operand register with the same bit width as that of the round calculation result, a plurality of message words and a plurality of message parameters for round calculation can be stored through a first source operand in the first source operand register, and a calculation result of each round calculation is stored through a first destination operand in the first destination operand register, so that round calculation of the SM3 algorithm is established on the source operand and the destination operand with the same bit width as that of the round calculation result. Furthermore, when performing the current round of calculation, the embodiment of the application may read the message word and the message parameter corresponding to the current round of calculation from the first source operand of the first source operand register; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; the status word of the next state is stored in the first destination operand of the first destination operand register. Therefore, under the condition that the source operand register and the destination operand register with the same bit width as that of the round calculation result are set, the embodiment of the application can fully utilize the bit width of the operand register on the vector granularity same as that of the round calculation result, and each round of calculation is realized through a single round calculation instruction, so that the round calculation speed of the SM3 algorithm is improved, and the processing speed of the SM3 algorithm is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.

Fig. 1 is a diagram of an example processor implementation of the SM3 algorithm for 128-bit operand registers.

Fig. 2 is a diagram illustrating an example implementation of a processor of the SM3 algorithm provided in an embodiment of the present application.

Fig. 3 is a flowchart of a processing method of the SM3 algorithm provided in the embodiment of the present application.

Fig. 4 is a flowchart of another processing method of the SM3 algorithm provided in the embodiment of the present application.

FIG. 5 is a diagram of an example of hardware for performing round computation according to an embodiment of the present application.

Fig. 6A is a flowchart of another processing method of the SM3 algorithm according to an embodiment of the present application.

Fig. 6B is a flowchart of a method for calculating a status word by using the SM3 algorithm according to an embodiment of the present application.

Fig. 7 is an exemplary diagram of a wheel calculation unit provided in an embodiment of the present application.

Fig. 8 is a diagram of another hardware example of performing round computation according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The SM3 algorithm can be used to length less than 2 ⁶⁴ And (3) filling the message with bits to form a message with the length being multiple of 512 bits, grouping the message into a plurality of 512-bit message blocks, and then sequentially carrying out iterative compression processing on each 512-bit message block respectively to output 256-bit hash values corresponding to each message block. In the SM3 algorithm, the iterative compression process of message blocks mainly involves message expansion and round computation.

During message expansion, the message block may be divided into initial W ₀ To W ₁₅ Is then expanded by message expansion to generate W ₀ To W ₆₇ 68 message words of (2) and W ₀ ' to W ₆₃ ' 64 message parameters. In the process of expanding the generated message parameters, the message word W may be used _i And W is _i+4 Obtaining the message parameter W _i ', e.g. W _i ’＝W _i ⊕W _i+4 I.epsilon.0 to 63; in the process of generating message word by extension, message word W ₁₆ To W ₆₇ Can be used for the initial 16 message words W ₀ To W ₁₅ Is generated by expanding the obtained message word, for example:

W _m ←P ₁ (W _m-16 ⊕W _m-9 ⊕(W _m-3 <<<15))⊕(W _m-13 <<<7)⊕W _m-6 m.epsilon.16 to 67.

During the round computation, the SM3 algorithm can compute from the initial 8 state words (a ₀ 、B ₀ 、C ₀ 、D ₀ 、E ₀ 、F ₀ 、G ₀ 、H ₀ ) Firstly, adding corresponding message words and message parameters in each round, so as to obtain 8 state words output by a final round through round calculation of multiple rounds of iteration; the 8 state words of the final round output may form a 256-bit hash value. Wherein the message word and message parameter corresponding to the current round are used in one round of calculation.

For each round of computation, the processor needs to read the source operand for the round of computation (e.g., the message word and message parameter corresponding to the current round of computation) from the source operand register and write the computed state word to the destination operand register. Based on the 128-bit operand registers of the processor, the round computation of SM3 is performed at a vector granularity of 128 bits, which results in that the operand bit width does not correspond to the 256-bit computation result of the round computation, so that the processing speed of SM3 is limited. That is, each round of computation of SM3 needs to be performed by a plurality of rounds of computation instructions due to operand bit width limitation of operand registers (source operand register and destination operand register), resulting in limited processing speed of SM 3.

For ease of illustration, FIG. 1 illustrates an exemplary diagram of a processor implementation of an SM3 algorithm based on 128-bit operand registers, as shown in FIG. 1, the processor's instruction set including an SM3 two-round four state word update instruction 111 and an SM3 two-round four remaining state word update instruction 112 for round computation; and, the operand registers of the processor include: source operand register 121 and source operand register 122 having a bit width of 128 bits, and destination operand register 131 and destination operand register 132 having a bit width of 128 bits;

in each round of computation, 128 bits of source operand register 121 may hold 4 message words for 4 rounds and 128 bits of source operand register 122 may hold 4 message parameters for 4 rounds based on 32 bits of message words and message parameters. Meanwhile, since the bit width of one destination operand register is 128 bits, and the calculation result of one round of calculation is 256 bits (i.e., a hash value of 256 bits is formed based on 8 state words, and one state word is 32 bits), the processor needs to complete the state word calculation of the current round through two rounds of calculation instructions. For example, the SM3 two-round four status word update instruction 111 calculates 4 status words of the current round based on the message words and message parameters stored in the source operand register 121 and the source operand register 122, and stores them in the destination operand register 131; the SM3 two-round four remaining state word update instruction 112 calculates the remaining 4 state words of the current round based on the message words and message parameters stored in the source operand register 121 and the source operand register 122, and stores them in the destination operand register 132.

It can be seen that under the bit width constraint of the 128-bit operand register, the round computation of SM3 is performed at a vector granularity of 128 bits, which results in one round of computation requiring splitting into multiple rounds of computation instruction execution, resulting in a limited processing speed of the SM3 algorithm. Meanwhile, with the development of high-performance processors, vector components can provide more operands and parallel computation, and if SM3 computation is performed on a 128-bit vector granularity, the vector components cannot adapt to the performance development of the processors.

Based on this, the embodiments of the present application provide an improved processing scheme of the SM3 algorithm, and propose a single round calculation instruction based on an operand register (for example, a 256-bit operand register) with the same bit width as the round calculation result, so as to perform round calculation on the same vector granularity (for example, a 256-bit vector granularity) as the bit width of the round calculation result, thereby realizing an increase in the processing speed of the SM3 algorithm. That is, the bit width of the operand register is fully utilized on the basis that the bit width of the operand register corresponds to the bit width of the round calculation result, and each round of calculation is realized by a single round calculation instruction.

As an alternative implementation, taking the example of using 256-bit operand registers by the processor, fig. 2 illustrates an example diagram of a processor implementation of the SM3 algorithm provided in an embodiment of the present application. It should be noted that, the bit width of the operand register may be the same as the bit width of the round calculation result, and based on the current round calculation result being 256 bits, the embodiment of the present application will be described by taking the bit width of the operand register as 256 bits as an example. Referring to fig. 2, the processor may include a round calculation unit 210 for round calculation, and a message extension unit 220 for message extension; and the instruction set of the processor includes: a single round calculation instruction 211 for round calculation (such as the round calculation unit 210 executing the single round calculation instruction 211 to implement round calculation of the SM3 algorithm), and a single message extension instruction 221 for message extension (such as the message extension unit 220 executing the message extension instruction 221 to implement message extension of the SM3 algorithm). The message expansion instruction 221 provided in the embodiment of the present application may also perform message expansion based on 256-bit vector granularity, similar to the round calculation performed by the single round calculation instruction 211 based on 256-bit vector granularity. In one example, single round calculation instruction 211 may be referred to as VSM3RND256 and message extension instruction 221 may be referred to as VSM3MSG256.

As further shown in fig. 2, operand registers may be provided in the processor, the provided operand registers may include: a first source operand register 231, a second source operand register 232, a third source operand register 233, a first destination operand register 241, and a second destination operand register 242; wherein each operand register (including each source operand register and each destination operand register) has the same bit width as the round result, e.g., each operand register has a bit width of 256 bits.

In the present embodiment, the first source operand register 231 and the first destination operand register 241 are used for round computation. As an alternative implementation, the first source operand register 231 may hold a total of 256 bits of message words and message parameters; for example, first source operand register 231 may hold 4 message words (W _j ，W _j+1 ，W _j+2 ，W _j+3 ) 4 message parameters (W _j ’，W _j+1 ’，W _j+2 ’，W _j+3 '), where j represents the current number of wheels of the wheel calculation, and W _j+3 Not exceeding W ₆₇ ，W _j+3 ' not exceeding W ₆₃ ’。

In the current round calculation, the round calculation unit 210 may execute the single round calculation instruction 211 to calculate 8 status words of the current round based on the corresponding message words and message parameters of the current round calculation stored in the first source operand register 231, and store the 8 status words generated by the current round calculation in the first destination operand register 241.

The second source operand register 232, the third source operand register 233 and the second destination operand register 242 are used for message expansion. As an alternative implementation, the second source operand register 232 may store the 8 message words with the largest sequence numbers that have been obtained, and the third source operand register 233 may store the 8 message words preceding the message word stored in the second source operand register 232; in performing message expansion, the message expansion unit 220 may execute the message expansion instruction 221 to expand the next 8 message words based on the message words stored in the second source operand register 232 and the third source operand register 233, and store the expanded 8 message words in the second destination operand register 242.

For example, in 16 message words W based on initial partitioning ₀ To W ₁₅ The second source operand register 232 may store W during message expansion ₈ To W ₁₅ Is a 8 message word (one message word is 32 bits), the third source operand register 233 may hold W ₈ To W ₁₅ The first 8 message words W ₀ To W ₇ Message expansion unit 220 may execute message expansion instruction 221 to store 16 message words W based on second source operand register 232 and third source operand register 233 ₀ To W ₁₅ The 8 message words W after expansion ₁₆ To W ₂₃ The method comprises the steps of carrying out a first treatment on the surface of the The 8 message words W obtained by expansion ₁₆ To W ₂₃ May be stored in the second destination operand register 242.

Based on the single round calculation instruction and the message expansion instruction provided by the embodiment of the application, the following describes a processing method of the SM3 algorithm provided by the embodiment of the application. As an alternative implementation, fig. 3 illustrates a flowchart of a processing method of the SM3 algorithm provided in an embodiment of the present application, where the method flow may be implemented by a processor executing a single round calculation instruction (e.g., a round calculation unit in the processor executes a single round calculation instruction, implementing the method flow). Referring to fig. 3, the method flow may include the following steps.

In step S31, reading a message word and a message parameter corresponding to the current round of calculation from a first source operand of a first source operand register; wherein the first source operand includes a plurality of message words and message parameters and the bit width of the first source operand is the same as the bit width of the round calculation result.

In the embodiment of the application, in the case of setting the first source operand register with the same bit width as the round calculation result, the first source operand can be set through the first source operand register, and the bit width of the first source operand is the same as the bit width of the round calculation result. For example, in the case of setting a 256-bit-wide first source operand register, a 256-bit-wide first source operand may be set by the first source operand register. The first source operand may be considered a source operand for a round calculation, and may include a plurality of message words and a plurality of message parameters having a total bit width that is the same as a bit width of a round calculation result based on use of the message words and the message parameters in the course of the round calculation. For example, the current round of calculation is the j-th round of calculation, then the first source operand may include 4 message words (W _j ，W _j+1 ，W _j+2 ，W _j+3 ) 4 message parameters (W _j ’，W _j+1 ’，W _j+2 ’，W _j+3 ’)。

Based on the first source operand stored in the first source operand register, the embodiment of the application can read the message word and the message parameter corresponding to the current round calculation from the first source operand when the current round calculation is performed. As an alternative implementation, embodiments of the present application may operate from a first source based on the current number of wheels of the wheel calculationAnd reading the message word and the message parameter corresponding to the current round number in the number. In one example, the round calculation may be performed for 64 rounds, then at the jth round calculation, embodiments of the present application may read the message word W from the first source operand _j And message parameter W _j ’。

In step S32, according to the single round calculation instruction, the state word of the next state is calculated by using the state word of the previous state and the corresponding message word and message parameter calculated by the current round.

After the message word and the message parameter corresponding to the current round of calculation are read from the first source operand stored in the first source operand register, the embodiment of the application can calculate the state word of the next state according to the single round of calculation instruction, and the state word of the next state can be regarded as the calculation result of the current round of calculation. Optionally, when calculating the status word of the next state according to the status word of the previous state, the single round calculation instruction may add the message word and the message parameter corresponding to the current round calculation, so as to obtain the calculation result of the current round.

In one example, assuming that the wheel calculation of the jth wheel is currently performed, the state word (A _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i ) Calculating corresponding message word W by current round _j And message parameter W _j ' the embodiment of the application can execute a single round calculation instruction to calculate the state word (A _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 ). For example, in the first round of calculation, the state word of the previous state is the initial state word (a ₀ ，B ₀ ，C ₀ ，D ₀ ，E ₀ ，F ₀ ，G ₀ ，H ₀ ) The calculated status word of the next status is (A ₁ ，B ₁ ，C ₁ ，D ₁ ，E ₁ ，F ₁ ，G ₁ ，H ₁ ) And so on.

In some embodiments, in the process of executing the single round calculation instruction to calculate the state word of the next state, the embodiments of the present application may calculate the first part of the state word of the next state according to the state word of the previous state; calculating corresponding message words and message parameters based on the state words of the previous state and the current round, and calculating intermediate variables; and further determining a second partial state word for the next state based on the intermediate variable; wherein the state words may be divided into a first partial state word and a second partial state word, and the calculated first partial state word and second partial state word of the next state may form the state word of the next state.

As an optional implementation, the embodiment of the present application may set (B, C, D, E, F, G, H) to be a first part of status words and (a, E) to be a second part of status words, where the next state of the first part of status words (B, C, D, F, G, H) may be determined by the status word of the previous state; for the next state of the second part of state words (a, E), the embodiment of the application may calculate the corresponding message word and message parameter based on the state word of the previous state and the current round, determine the intermediate variable, and then determine the next state of the second part of state words based on the calculated intermediate variable.

In one example, assume that the current number of rounds of round calculation is j (j may be stored in the immediate number imm 8), and the calculation result of the round calculation is (a _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Then based on the last state word (a _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i ) Calculating corresponding message word W by current round _j And message parameter W _j ' the embodiment of the present application may calculate the first partial status word (B) of the next status by executing a single round calculation instruction _i+1 ，C _i+1 ，D _i+1 ，F _i+1 ，G _i+1 ，H _i+1 ) And intermediate variables SS1, SS2, TT1, TT2; further, a second partial state word (A) of the next state is calculated by intermediate variables TT1, TT2 _i+1 ，E _i+1 )。

As an alternative implementation, the calculation process of the single round calculation instruction may be represented by the following formula, for example:

SS1←((A _i <<<12)+E _i +(T _j <<<j))<<<7；

TT1←FF _j (A _i ，B _i ，C _i )+D _i +SS2+W _j ’；

TT2←GG _j (E _i ，F _i ，G _i )+H _i +SS1+W _j ；

D _i+1 ←C _i ；

C _i+1 ←B _i <<<9；

B _i+1 ←A _i ；

A _i+1 ←TT1；

H _i+1 ←G _i ；

G _i+1 ←F _i <<<19；

F _i+1 ←E _i ；

E _i+1 ←P ₀ (TT2)。

wherein,representing a 32-bit exclusive-or operation;<<<representing a cyclic left shift operation; the left assignment operator; t (T) _j Taking different values along with the change of j as algorithm constants; p (P) ₀ Representing a permutation function in a round calculation, P ₀ (X) can be expressed as:FF _j and GG _j Representing a Boolean function, and taking different expressions along with the change of j; specific:

further, Λ represents a 32-bit AND operation, v represents a 32-bit OR operation,representing a 32 bit non-operation.

In step S33, the status word of the next status is stored in a first destination operand of a first destination operand register, where the bit width of the first destination operand is the same as the bit width of the round calculation result.

After calculating the status word of the next state, the embodiment of the present application may store the status word of the next state in a first destination operand, where the first destination operand is disposed in a first destination operand register, and the bit width of the first destination operand is the same as the bit width of the round calculation result. For example, embodiments of the present application may set a 256-bit wide first destination operand register that may store a 256-bit first destination operand, such that embodiments of the present application calculate a 256-bit status word (a _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 ) Then, the embodiment of the application can store the status word of the next status in the first destination operand of the first destination operand register, thereby fully utilizing the bit width of the first destination operand register.

In some embodiments, the wheel calculation unit in the processor may execute the single wheel calculation instruction provided in the embodiments of the present application, so as to implement the method flow shown in fig. 3.

In one implementation example, the source operand for round calculation is ymm1, the destination operand is ymm0, and the current round number j for round calculation is stored in an immediate value imm8; based on the round calculation result being 256 bits, the bit widths of ymm1 and ymm0 may be 256 bits, and set to the 256-bit-wide source operand register and the 256-bit-wide destination operand register, respectively; based on the single round calculation instruction VSM3RND256 provided in the embodiments of the present application, the embodiments of the present application may perform a single round calculation operation on a 256-bit vector granularity based on a 256-bit source operand register and a destination operand register, and a round calculation may be completed by 2 beats. For example, in the j-th round of calculationIn this case, ymm1 stores 4 message words (W _j ，W _j+1 ，W _j+2 ，W _j+3 ) And 4 message parameters (W _j ’，W _j+1 ’，W _j+2 ’，W _j+3 '), 256 bits total; the single round calculation instruction VSM3RND256 provided in the embodiments of the present application can calculate the j-th round calculation result (a _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 ) And stored in ymm0.

In further embodiments, the embodiments of the present application may utilize a source operand register and a destination operand register having the same bit width as the bit width of the round calculation result when performing message expansion, thereby completing a round of message word expansion by a single message expansion instruction. For example, embodiments of the present application may complete a round of expansion of 8 message words in 1 beat with a single message expansion instruction. As an alternative implementation, fig. 4 illustrates another processing method flowchart of the SM3 algorithm provided in the embodiment of the present application, where the method flowchart may be implemented by the processor executing a single message expansion instruction (e.g., the message expansion instruction is executed by a message expansion unit in the processor to implement the method flowchart). Referring to fig. 4, the method flow may include the following steps.

In step S41, a message word for message expansion is read from the second source operand of the second source operand register and the third source operand of the third source operand register; the bit widths of the second source operand and the third source operand are the same as the bit width of the round calculation result, the second source operand comprises the first number of message words with the largest obtained sequence number, and the third source operand comprises the first number of message words before the second source operand.

In the case of setting the second source operand register and the third source operand register with the same bit width as the round calculation result, the embodiment of the application can set the second source operand through the second source operand register and set the third source operand through the third source operand register, and the bit widths of the second source operand and the third source operand are the same as the round calculation result. For example, in the case of setting the second source operand and the third source operand which are 256 bits wide, the second source operand and the third source operand of 256 bits may be set by the second source operand and the third source operand, respectively.

The second source operand and the third source operand may be considered as source operands for message expansion. In some embodiments, when performing message word expansion, the embodiments of the present application may expand a first number of message words with the same bit width as the round calculation result in one beat (one beat may be considered as one clock cycle of the processor) by a single message expansion instruction. As an alternative implementation, the second source operand may include the first number of message words with the largest sequence number that have been obtained, and the third source operand may include the first number of message words before the second source operand, so that the embodiments of the present application may utilize the second source operand and the third source operand to expand the first number of message words after the second source operand by a single beat.

In one example, based on the round calculation result being 256 bits, and one message word being 32 bits, the second source operand may include the 8 message words with the largest sequence number obtained, and the third source operand may include the 8 message words before the second source operand. For example, W based on initial partitioning of message blocks ₀ To W ₁₅ The second source operand may hold the 8 message words W with the largest sequence number ₈ To W ₁₅ The third source operand may hold 8 message words W before the second source operand ₀ To W ₇ . Based on W ₀ To W ₁₅ In the embodiment of the application, through a single message expansion instruction, 8 message words W after one beat expansion are obtained ₁₆ To W ₂₃ . Similarly, the next 8 message words W are expanded ₂₄ To W ₃₁ When the second source operand and the third source operand are storedCan be adjusted accordingly, such as the 8 message words W with the largest sequence numbers of the second source operand ₁₆ To W ₂₃ The 8 message words W before the third source operand is deposited ₈ To W ₁₅ For expanding the next 8 message words and so on.

Based on the message words stored in the second source operand and the third source operand, when the message word expansion is performed, the embodiment of the application can read the message words used for message expansion from the second source operand and the third source operand so as to expand the next message word based on the read message words. As an alternative implementation, when one message word is extended, the embodiments of the present application may read a plurality of message words with a specified sequence number before the message word to be extended currently from the second source operand and the third source operand, for example, the 16 th message word, the 13 th message word, the 9 th message word, the 6 th message word and the 3 rd message word before the message word to be extended is read. In one example, assume that the current message word W is to be expanded _m W can be read from the second source operand and the third source operand _m-16 、W _m-13 、W _m-9 、W _m-6 And W is _m-3 A total of 5 message words.

In step S42, the first number of message words after the second source operand is generated is extended with the read message words according to a single message extension instruction.

After the message words for message expansion are read from the second source operand and the third source operand, embodiments of the present application may expand a first number of message words after generating the second source operand according to a single message expansion instruction, so as to complete the expansion of the first number of message words after the second source operand in one beat by the message expansion instruction, thereby completing the expansion of a plurality of message words with the same bit width as the round calculation result in one beat.

In one example, based on the current message word to be expanded, the embodiments of the present application may utilize, after reading a plurality of message words of a specified sequence number preceding the current message word to be expanded from the second source operand and the third source operand, based on the plurality of read message wordsThe message word is extended by the permutation function and the 32-bit exclusive-or operation in the message extension. For example, assume that the message word W is currently to be expanded _m W can be read from the second source operand and the third source operand _m-16 、W _m-13 、W _m-9 、W _m-6 And W is _m-3 The message word W can then be expanded by the following formula _m ：

Wherein P is ₁ Representing a permutation function in message extension, message words P for message words X, Y, Z ₁ (X) can be expressed as:

in step S43, the message word generated by the extension is stored in the second destination operand of the second destination operand register; wherein the bit width of the second destination operand is the same as the bit width of the round calculation result.

After the first number of message words after the second source operand is generated by extension, the total bit width of the first number of message words generated by extension is the same as the round calculation result.

In some embodiments, the message expansion unit in the processor may execute the message expansion instruction provided in the embodiments of the present application, so as to implement the method flow shown in fig. 4.

According to the embodiment of the application, on the basis of setting the second source operand register, the third source operand register and the second destination operand register with the same bit width as that of the round calculation result, the obtained first number of message words with the largest sequence number can be stored through the second source operand in the second source operand register, the first number of message words before the second source operand is stored through the third source operand in the third source operand register, the first number of message words after the second source operand generated by expansion are stored through the second destination operand in the second destination operand register, and therefore message expansion of the SM3 algorithm is established on the source operand and the destination operand with the same bit width as that of the round calculation result. Furthermore, within a beat of message extension, embodiments of the present application may read a message word for message extension from the second source operand of the second source operand register and the third source operand of the third source operand register; further, according to a single message expansion instruction, utilizing the read message words to expand the first number of message words after completing in one beat; the extended generated message word may be stored in a second destination operand of the second destination operand register. According to the embodiment of the application, the bit width of the operand register can be fully utilized on the basis that the bit width of the operand register is the same as that of the round calculation result, a plurality of message words with the same bit width as that of the round calculation result are generated in one beat by means of single message expansion instruction, the message expansion speed of the SM3 algorithm is improved, and the processing speed of the SM3 algorithm is further improved.

In one implementation example, given that the source operands for message expansion are ymm1 and ymm2, respectively, and the destination operand is ymm0, the bit width of ymm1, ymm2, and ymm0 may be 256 bits based on the round calculation result, and ymm1 and ymm2 are set in source operand registers of 256 bits wide, and ymm0 is set in destination operand registers of 256 bits wide. Based on the message expansion instruction VSM3MSG256 provided in the embodiment of the present application, the embodiment of the present application can complete expansion of 8 message words in 1 beat on a 256-bit vector granularity. For example, based on an initially divided message word W ₀ To W ₁₅ Ymm1 stores the message word W ₈ To W ₁₅ Ymm2 stores the message word W ₀ To W ₇ The method comprises the steps of carrying out a first treatment on the surface of the The message expansion instruction VSM3MSG256 provided by the embodiment of the application can expand and generate the message word W in one beat through the corresponding formula ₁₆ To W ₂₃ And stored in ymm0; the extension of the subsequent message word is realized in the same way.

In further embodiments, a round calculation is provided based on embodiments of the present applicationFIG. 5 is a diagram illustrating an alternative hardware example of performing a round calculation according to an embodiment of the present application, where the hardware configuration shown in FIG. 5 may perform a round calculation within 2 beats. As shown in FIG. 5, in the jth round of computation, embodiments of the present application may be based on 8 state words (A _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i ) And a message word W _j And message parameter W _j ' 8 state words (A) _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 ) The specific process is as follows:

in the first stage of running water FX1, A _i 、C _i 、E _i And G _i Based on the left assignment operator (≡), a beat is latched in the second stage flowing water FX2 to obtain B in the second stage flowing water _i+1 、D _i+1 、F _i+1 And H _i+1 The method comprises the steps of carrying out a first treatment on the surface of the At the same time B _i Execution of<<<9, generating C in the second stage of flowing water _i+1 ；F _i Execution of<<<19, generating G in the second stage of running water _i+1 ；

In the first stage of the running water FX1, tj is executed based on the current number of rounds j<<<j, the result of the operation is input to the first 32-bit CSA unit (CSA 32_1 shown in FIG. 5), and CSA32_1 is input to A at the same time _i <<<12 and E _i The method comprises the steps of carrying out a first treatment on the surface of the Csa32_1 may process the input data and compress it into 2-term outputs, with the 2-term output of csa32_1 going to the first 32-bit Adder (ade32_1 as shown in fig. 5); addition result execution by Adder32_1<<<7, obtaining an intermediate variable SS1; note that CSA (Carry Save Adder) is a carry save adder, which may be 32 bits wide, and if the inputs of the CSA unit are a, b, and c and the outputs are sum and car, it performs the calculation: sum=a # -b # -c, car=a # -c,&b|a&c|b&c；

further, in the first stage of running water FX1, SS1 and A _i <<<12 performing exclusive or (XOR) on the operation result to obtain an intermediate variable SS2; intermediate variable SS1 SS2 may be stored in a register of second stage pipeline FX2 waiting for the next beat to execute;

at the same time, in the first stage of running water FX1, j, E _i 、F _i And G _i Execution GG _j Calculation, GG _j Calculation result, H _i And W is _j Inputting a second 32-bit CSA unit (CSA32_2 as shown in FIG. 5); the 2-item output of CSA32_2 is stored in FX2 register to wait for the next beat to be executed; and j, A _i 、B _i And C _i Executing FF _j Calculation, FF _j Calculation result, D _i And W is _j ' input the third 32-bit CSA unit (CSA32_3 as shown in FIG. 5), the 2-item output of CSA32_3 is stored in FX2 register, waiting for the next beat to execute;

on the next beat, the 2-item output of SS1, CSA32_2 in FX2 register goes into the fourth 32-bit CSA unit (CSA32_4 as shown in FIG. 5); the 2-term output of CSA32_4 is input to a second 32-bit Adder (Adder 32_2 as shown in FIG. 5) to generate TT2, TT2 goes through P ₀ Hardware computation of a function, yielding E _i+1 ；

The 2-term output of SS2, csa32_3 in the FX2 register goes into the fifth 32-bit CSA unit (csa32_5 as shown in fig. 5), the 2-term output of csa32_5 goes into the third 32-bit Adder (ade32_3 as shown in fig. 5), thereby generating TT1; TT1 is calculated to obtain A based on left assignment operator (≡) _i+1 。

The process of the round calculation shown in fig. 5 can complete one round calculation of one round in two beats by a single round calculation instruction; that is, in two consecutive beats, the state word used in the first-stage pipeline input wheel calculation is generated by one beat, and the calculation result of the one-round calculation is generated by the other beat in the second-stage pipeline, and part of the pipeline relation is exemplified in the following table 1:

TABLE 1

In the process of round calculation shown in fig. 5, the calculation result of each round of round calculation is 8 state words, and the result of the calculation of the previous round is used as the input of the calculation of the next round, so that the round calculation is performed in an iterative manner, and a hash value of the round calculation is obtained in the final round; if a round calculation of 64 iterations is performed, the round calculation unit needs to be completed by 128 beats. It can be seen that during the multi-round calculation of the SM3 algorithm, there is a data dependency in the front and rear round calculation, which results in that the round calculation can only be performed iteratively in series, and thus the throughput of the round calculation needs to be improved.

Based on this, in still further embodiments, the embodiments of the present application provide a round calculation implementation manner supporting internal bypass and provide a corresponding microstructure design, so that, in a round calculation process, a status word obtained by the second-stage flowing water FX2 is returned to the first-stage flowing water FX1 for selecting a status word used by a subsequent round calculation, so as to eliminate data correlation between round calculations of front and rear wheels, so that the SM3 algorithm continuous round calculation can be performed in a pipelining manner; and the effect of completing one-round calculation through one beat after the first-round calculation is achieved under the cost of adding a little time delay and hardware, and the throughput rate of the round calculation is improved.

Based on the foregoing idea, as an alternative implementation, fig. 6A illustrates a flowchart of yet another processing method of the SM3 algorithm provided in the embodiment of the present application, where the method flowchart may be implemented by being executed by a processor, for example, by being executed by a round computing unit in the processor, and referring to fig. 6A, the method flowchart may include the following steps.

In step S610, the status word of the next status generated by the current round calculation is returned from the second-stage pipeline to the first-stage pipeline.

In step S611, if the next round calculation of the current round is performed, in the first-stage pipelining, a status word of the returned next state is selected as the status word used for the next round calculation; if the current wheel calculation is performed, in the first stage of the pipeline, a state word of the previous state is selected and used as the state word used for the current wheel calculation.

In step S612, according to the single round calculation instruction, calculating at the current round, and obtaining a status word generated by the current round calculation based on the selected status word, the corresponding message word and the message parameter; and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.

As an alternative implementation, in the first beat of the first round of calculation, the embodiment of the present application may select the state word of the initial state as the state word used in the first round of calculation, so as to calculate the state word of the next state. After the state word of the next state is calculated in the second stage pipeline, the state word of the next state can be returned to the first stage pipeline through a bypass mechanism so as to select the state word used by the subsequent round of calculation in combination with the state word of the last state existing in the first stage pipeline. In a possible implementation, it is assumed that the current round calculation is the j-th round calculation, and the state word of the last state is a _i To H _i The next state word of the next state generated by calculation of the jth round is A _i+1 To H _i+1 Then when the calculation of the j-th round is performed, the calculation is performed due to A _i+1 To H _i+1 Not yet generated, the embodiment of the application can select A _i To H _i Calculating a used status word as a jth round of calculation; the status word A generated by calculation of the jth round _i+1 To H _i+1 By bypassing the return to the first stage flow, based on the bypass enabling, embodiments of the present application may select A from the first stage flow at round j+1 _i+1 To H _i+1 Calculating the used status word as a j+1st round; further, the j+1st round calculates the generated state word A _i+2 To H _i+2 Can be returned to the first-stage running water through a bypass and is connected with the state word A of the last state in the first-stage running water _i+1 To H _i+1 The selection of the status word is performed together and so on. Therefore, the embodiment of the application returns the state word generated by each round of calculation to the first-stage running water through the bypass, so that the process of inputting the state word again in each round of calculation is omitted, and the number of beats occupied by inputting the state word can be greatly saved in the round of calculation.

As an alternative implementation, in the first beat of the first round of calculation, the embodiment of the application inputs an initial state word in the first-stage running water, and the state word used in the subsequent round of calculation is implemented by returning the state word generated by the second-stage running water to the first-stage running water; thus, in addition to the first beat calculated at the first round of running, in the first-stage running water input initial state words, in the subsequent two consecutive beats, the embodiment of the application can obtain the state words calculated at the first round of running water at the second round of running water through one beat and return to the first round of running water, and based on the state words returned to the first round of running water, the embodiment of the application can obtain the state words calculated at the next round of running water through the other beat at the second round of running water; that is, the status word calculated by one round of calculation can be generated by one beat of flowing water at the second level and returned to the first level, and based on the status word returned to the first level of flowing water, the embodiment of the application can directly generate the status word calculated by the next round of calculation at the next beat of flowing water at the second level of flowing water, so that the process that the next beat still needs to input the status word used by the next round of calculation is avoided. For ease of understanding, reference may be made to the following table 2 illustrating a partial example of a pipeline relationship in accordance with an embodiment of the present application.

TABLE 2

Under the bypass mechanism of the embodiment of the application, the state words used by the calculation of other wheels except the first wheel calculation are returned to the first-stage running water for realizing the state words generated by the calculation of the previous wheel, and the calculation result of the calculation of the next wheel is directly generated in the second-stage running water of the next beat, so that the calculation of the other wheels except the first wheel calculation needs to be completed by using 2 beats, the calculation of the other wheels can be completed in one beat based on the bypass mechanism, thereby carrying out the calculation of 64 wheels, and the calculation unit of the wheel can be completed by 65 (1×2+63 beats). It can be seen that, compared to the round calculation process illustrated in fig. 5, in the embodiment of the present application, by the mechanism that the calculation result of one round of calculation is returned from the second stage pipeline to the first stage pipeline by the bypass, the processing speed of the round calculation can be greatly improved, so that the processing speed of the SM3 algorithm is further greatly improved.

Under the bypass mechanism, the embodiment of the application can further improve the calculation logic of the round calculation process, so that the round calculation method is suitable for the round calculation mode under the bypass mechanism. In the case of a state word divided into a first partial state word (e.g., B, C, D, F, G, H) and a second partial state word (e.g., a, E), embodiments of the present application calculate the resulting next state's state word (e.g., a _i+1 To H _i+1 ) After returning from the second stage pipeline to the first stage pipeline, the state word A of the next state _i+1 To H _i+1 Status word A which can be associated with the previous status _i To H _i Selection is made.

Taking the current jth round of calculation as an example, the embodiment of the application can select the state word A of the last state in the first-stage pipelining _i To H _i Performing calculation processing; as an alternative implementation, embodiments of the present application may assign an operator (≡) to the selected A based on left _i 、C _i 、E _i And G _i Respectively processing to obtain B in the second stage of flowing water _i+1 、D _i+1 、F _i+1 And H _i+1 The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the first stage of running water, the water is discharged to the point B _i Performing a cyclic left shift operation (e.g., performing B _i <<<9) to obtain C in the second stage of the pipeline _i+1 The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, in the first stage of running water, F _i Performing cyclic left shift operations (e.g. performing F _i <<<19) to obtain G in the second stage of the pipeline _i+1 。

In some further embodiments, the embodiments of the present application may return intermediate state words As and Ac obtained in the second stage pipeline to the first stage pipeline through a bypass; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a; thus, the result of the cyclic left shift operation of As and Ac can be matched with the state word A _i Selecting the cyclic left shift operation result of (2) and 0; as an alternative implementation, if the next round of calculation of the jth round is performed, in the first-stage pipelining, selecting a cyclic left shift operation result of the intermediate state words As and Ac for the next round of calculation; if the current wheel calculation is performed, in the first stage of the pipeline, Select status word A _i The cyclic left shift operation result sum 0 of (2) is used for the calculation of the current wheel;

a second partial status word (A) _i+1 ，E _i+1 ) According to the calculation of the first stage running water, A can be selected according to the embodiment of the application _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' in the second stage of the pipeline, a state word A is generated _i+1 ，E _i+1 . As an alternative implementation, fig. 6B illustrates an alternative method flowchart for calculating a status word by the SM3 algorithm according to an embodiment of the present application, and as shown in fig. 6B, the method flowchart may include the following steps.

In step S621, in the first stage of pipelining, according to the selected state word A _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' intermediate data for calculating intermediate variables TT1 and TT2 are determined.

As an alternative implementation, for the j-th round of calculation, the embodiment of the application can stream the selected state word A in the first stage _i 、B _i And C _i Carry out FF _j Calculating; according to FF _j Is the result of the operation of (1), the selected state word D _i Message parameter W _j ' first CSA operation is carried out, first CSA operation result and selected state word A _i The cyclic left shift operation result of (2) is stored in a register of the second stage of pipeline;

at the same time, in the first stage, according to the selected A _i Cyclic left shift operation result of (2), 0, and T _j Performing a second CSA operation on the cyclic left shift operation result, performing a first addition operation on the second CSA operation result, and performing a cyclic left shift operation on the first addition operation result to obtain an intermediate variable S1;

at the same time, in the first stage of the pipeline, the selected state word E _i Performing cyclic left shift operation to obtain an intermediate variable S2; for selected status word G _i 、F _i And E is _i GG is performed _j Calculation according to GG _j Operation result, selected status word H _i Message word W _j Performing a third CSA operation; thus, the third CSA operation result, the intermediate variables S1 and S2 are subjected to a fourth CSA operation, and the fourth CSA operation result is stored in the register of the second stage pipeline; in the first stage of running water, the intermediate variables S1 and S2 are subjected to a second addition operation to obtain an intermediate variable SS1; the intermediate variable SS1 is stored in a register of the second stage pipeline.

In step S622, at the second stage of the pipeline, an intermediate variable TT1 is calculated based on the intermediate data, and a status word A is determined based on the intermediate variable TT1 _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the second stage of the pipeline, calculating an intermediate variable TT2 from the intermediate data, and determining a status word E from the intermediate variable TT2 _i+1 。

As an optional implementation, in the second stage of pipeline, the embodiment of the application may perform a third addition operation on the fourth CSA operation result to obtain an intermediate variable TT2; thereby passing the intermediate variable TT2 through P ₀ Function, get state word E _i+1 。

At the same time, the embodiment of the application can realize the second-stage pipelining of the intermediate variable SS1 and the state word A _i Performing exclusive or operation on the cyclic left shift operation result to obtain an intermediate variable SS2; and carrying out fifth CSA operation according to the intermediate variable SS2 and the first CSA operation result to obtain intermediate state words As and Ac. On the one hand, the intermediate state words As and Ac are returned to the first stage pipeline; on the other hand, a fourth addition operation is performed on the intermediate state words As and Ac to obtain an intermediate variable TT1, and the intermediate variable TT1 is processed based on a left assignment operator to obtain a state word A _i+1 。

Based on the method flow principle of the bypass mechanism provided by the embodiment of the application, the embodiment of the application further provides an alternative hardware example for performing round calculation. On the basis of the processor implementation illustrated in fig. 2, fig. 7 illustrates an alternative exemplary diagram of a wheel calculation unit provided in an embodiment of the present application, as illustrated in fig. 7, where the wheel calculation unit may include: bypass unit 710, multi-bank selector 720, and computation logic 730;

the bypass unit 710 is configured to return, from the second-stage pipeline to the first-stage pipeline, a status word of a next status generated by calculation of the current wheel;

A plurality of groups of selectors 720 for selecting the returned status word of the next status in the first-stage pipeline as the status word used for the next round of calculation if the next round of calculation is performed; if the current wheel calculation is carried out, in the first-stage pipelining, selecting a state word of the previous state as the state word used for the current wheel calculation;

calculation logic 730, configured to calculate, at a current round according to the single round calculation instruction, a status word generated by the current round calculation based on the status word, the corresponding message word, and the message parameter selected by the multiple groups of selectors; and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.

In further embodiments, the bypass unit 710 may be further configured to return intermediate state words As and Ac obtained in the second stage pipeline to the first stage pipeline; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a;

the multiple sets of selectors 720 are further configured to select, if the calculation of the next round of the jth round is performed, the cyclic left shift operation result of the intermediate state words As and Ac in the first-stage pipeline for the calculation of the next round; if the current round calculation is performed, in the first-stage pipelining, selecting a state word A _i And 0 for the current round of calculation.

As an alternative implementation, fig. 8 illustrates another alternative hardware example diagram for performing round computation provided in the embodiment of the present application, where the hardware structure shown in fig. 8 may complete one round of computation in 1 beat, in addition to the first round of computation. As shown in fig. 8, taking the current wheel calculation as the jth wheel calculation as an example, the jth wheel calculation generates a state word a of the next state _i+1 To H _i+1 Correspondingly, the last shapeThe state word of the state is A _i To H _i The method comprises the steps of carrying out a first treatment on the surface of the And the plurality of sets of selectors in the wheel calculation unit may include: a first set of selectors 810, a second set of selectors 820, a third set of selectors 830;

the first set of selectors 810 may include a plurality of selectors (Mux) in the state word H _i+1 、G _i+1 、F _i+1 And E is _i+1 After returning to the first stage of the pipeline through the bypass unit, the plurality of selectors can select the returned state word H in the first stage of the pipeline when the next round of calculation of the j-th round is performed _i+1 、G _i+1 、F _i+1 And E is _i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word H is selected when the calculation of the front wheel is performed _i 、G _i 、F _i And E is _i As a state word used for calculating the current wheel;

the second set of selectors 820 may include a plurality of selectors, in state word A _i+1 、B _i+1 、C _i+1 And D _i+1 After returning to the first stage of the pipeline through the bypass unit, the plurality of selectors can select the returned state word A in the first stage of the pipeline when the next round of calculation of the j-th round is performed _i+1 、B _i+1 、C _i+1 And D _i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out _i 、B _i 、C _i And D _i As a state word used for calculating the current wheel;

the third set of selectors 830 may include a plurality of selectors for deriving the state word A _i+1 After the intermediate state words As and Ac of (a) are bypassed back to the first stage pipeline, the multiple selectors can select the cyclic left shift operation result of the returned As and Ac (e.g. As shown in FIG. 8) in the first stage pipeline when the next round of calculation of the j-th round is performed<<<12、Ac<<<12 For the next round of calculation; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out _i The cyclic left shift operation result (e.g., A shown in FIG. 8 _i <<<12 And 0) for the current wheel calculation.

In the implementation process of the calculation logic 730 obtaining the state word generated by each round of calculation based on the selected state word, the process of calculating the first part of the state word by the calculation logic 730 may be the same as the manner described in the corresponding part; for example, in the j-th round of calculation, the state word A selected by the first-stage pipelining is calculated _i 、C _i 、E _i And G _i Processing is performed based on the left assignment operator (≡), respectively, so as to generate a status word B in the second-stage pipeline _i+1 、D _i+1 、F _i+1 And H _i+1 The method comprises the steps of carrying out a first treatment on the surface of the At the same time, in the first stage of pipelining, for the selected state word B _i Performing a cyclic left shift operation to generate a state word C in the second stage pipeline _i+1 The method comprises the steps of carrying out a first treatment on the surface of the At the same time, in the first stage of pipelining, for the selected state word F _i Performing a cyclic left shift operation to generate a state word G in the second stage pipeline _i+1 。

Aiming at the calculation of the state words A and E, in order to reduce the key path time delay, the embodiment of the application can adjust the calculation logic of the two-stage pipelining; for example, the computational logic may be pipelined at a first level, based on the selected state word A _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' determining intermediate data for calculating intermediate variables TT1 and TT 2; in the second stage of the pipeline, based on the intermediate data, an intermediate variable TT1 is calculated and a state word A is determined based on the intermediate variable TT1 _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the second stage of the pipeline, calculating an intermediate variable TT2 from the intermediate data, and determining a status word E from the intermediate variable TT2 _i+1 。

In a more specific alternative implementation of computing the state words A and E, as shown in connection with FIG. 8, using the jth round of computation as an example, in a first stage of pipelining, the computation logic may select the state word A selected by the second set of selectors 820 _i 、B _i And C _i Carry out FF _j Calculating; FF (FF) _j The result of the operation of (a), the state word D selected by the second group selector 820 _i Message parameter W _j ' first 3 in 2 out CSAA unit (e.g., csa32_1 of fig. 8) performing a first CSA operation; at the same time, the state word A selected for the second group selector 820 _i A cyclic left shift operation is performed (e.g. the cyclic left shift operation shown in figure 8 with a value of 12,<<<12 A) is provided; first CSA operation result of CSA32_1 and state word A _i The cyclic left shift operation result of (2) is stored in a register of the second stage of pipeline;

meanwhile, in the first stage of pipelining, the computation logic may select A from the third set of selectors 830 _i Cyclic left shift operation result of (2), 0, and T _j The result of the cyclic left shift operation of (e.g., T shown in FIG. 8) _j <<<j) Feeding into a second 3-input 2-output CSA unit (e.g., CSA32_2 shown in FIG. 8) to perform a second CSA operation; the second CSA operation result of csa32_2 is input to a 32-bit first adding unit (for example, adder32_1 shown in fig. 8) to perform a first adding operation; further, the first addition result of Adder 32-1 performs a cyclic left shift operation (e.g., a cyclic left shift operation having a value of 7 as shown in FIG. 8,<<<7) Thereby obtaining an intermediate variable S1;

At the same time, in the first stage of pipelining, the computation logic may select the state word E selected by the first set of selectors 810 _i A cyclic left shift operation is performed (e.g. the cyclic left shift operation shown in figure 8 with a value of 7,<<<7) Obtaining an intermediate variable S2; also, the computation logic may select the status word G selected by the first set of selectors 810 _i 、F _i And E is _i GG is performed _j Calculating; GG _j The result of the operation of (a), the status word H selected by the first group selector 810 _i Message word W _j Sending to a third 3-input 2-output CSA unit (e.g., CSA32_3 shown in FIG. 8) for performing a third CSA operation;

in the first stage of pipeline, the calculation logic may send the third CSA calculation result of the intermediate variables S1 and S2 and csa32_3 to the first 4-input-2-output CSA unit (e.g., csa42_1 shown in fig. 8) to perform the fourth CSA calculation; the fourth CSA operation result of CSA42_1 is stored in a register of the second stage pipeline;

meanwhile, in the first stage of pipeline, the calculation logic may send the intermediate variables S1 and S2 to a second addition unit (for example, adder32_2 shown in fig. 8) with 32 bits to obtain an intermediate variable SS1; the intermediate variable SS1 is stored in a register of the second stage pipeline.

In the second stage of pipeline, the calculation logic may send the fourth CSA operation result of csa42_1 to a third addition unit (for example, adder32_3 shown in fig. 8) with 32 bits, and perform a third addition operation to obtain an intermediate variable TT2; passing the intermediate variable TT2 through P ₀ Function, get state word E _i+1 The method comprises the steps of carrying out a first treatment on the surface of the At this time E _i+1 The first-stage running water is returned through a bypass;

in the second stage of the pipeline, the computation logic may store intermediate variables SS1, state word A _i Performing an exclusive or (XOR) operation on the cyclic left shift operation result of (2) to obtain an intermediate variable SS2; the intermediate variables SS2 and the first CSA operation result of csa32_1 are sent to a fourth CSA unit (e.g., csa32_4 shown in fig. 8) with 3 input and 2 output, and a fifth CSA operation is performed to obtain intermediate status words As and Ac; the intermediate state words As and Ac are returned to the first stage of the pipeline on the one hand, and on the other hand, the intermediate state words As and Ac are sent to a fourth adding unit (for example, adder32_4 shown in fig. 8) with 32 bits, and a fourth adding operation is performed to obtain an intermediate variable TT1; thus, based on the left assignment operator, the intermediate variable TT1 is processed to obtain the state word A _i+1 ；A _i+1 And returns to the first stage of running water through the bypass.

The above shows that the calculation logic 730 selects the state word A when performing the jth round of calculation _i To H _i Example procedure for performing round computation upon performing the next round computation for the jth round, embodiments of the present application may select the state word a accordingly based on the bypass mechanism _i+1 To H _i+1 The round calculation is performed, and the implementation process of the round calculation can be similarly referred to the description of the corresponding parts, except that the adaptation of the message word and the message parameter is adjusted.

It can be seen that for the computation of state words A and E, the present embodiment of the application can be used to obtain an intermediate state word A of state word A _S And A _C Returning to the first stage pipeline, then A _S And A _C Each cycle is shifted left by 12 bits and then neutralized in the third set of selectors 830 (a _i 0) selecting; if go intoThe next round of round computation is then bypass enabled and the third set of selectors 830 selects (a _S ，A _C ) For round calculation, otherwise select (A _i 0) for round calculation; selection result and Tj of the third group selector 830<<<j, compressing through CSA32_2, and then sending the compression result of CSA32_2 into Adder32_1; the result of Adder32_1 is recycled to the left by 7 bits, obtaining an intermediate variable S1; on the other hand, the first group selector 810 is in pair E _i+1 And E is _i After the second selection, the selection result is circularly shifted to the left by 7 bits to obtain an intermediate variable S2;

further, S1, S2 and two output results from CSA32_3 are sent to CSA42_1 to generate two compression results; after two compression results generated by CSA42_1 are stored in FX2 register, they pass through Adder32_3 and P ₀ A function, obtaining a calculation result of the state word E in one round;

meanwhile, S1 and S2 generate an intermediate variable SS1 through Adder32_2, and the intermediate variable SS1 is stored in the FX2 register; then in the second stage of the pipeline, the intermediate variables SS1 and A _i <<<12, obtaining an intermediate variable SS2 by a 32-bit exclusive-or operation (XOR); the operation result of CSA32_1 stored in the intermediate variable SS2 and FX2 registers is compressed by CSA32_4 to generate A _S And A _C ；A _S And A _C And bypassing the first stage of flowing water, and simultaneously inputting Adder32_4 to obtain the calculation result of the state word A in one round.

According to the processing scheme of the SM3 algorithm, which is provided by the embodiment of the application, the round calculation can be performed on the vector granularity with the same bit width as the round calculation result (for example, the round calculation is performed on the vector granularity of 256 bits), the bit width of an operand register of a vector component can be fully utilized, and the processing speed of SM3 is improved. Further, for the round calculation of the SM3 algorithm, because data dependence exists between the round calculation of the front and rear wheels of the SM3 algorithm, the round calculation can only be executed in series, and the throughput rate of calculation is not high; therefore, the embodiment of the application further provides a microstructure design supporting an internal bypass, the state word and the intermediate state word generated by one round of calculation in the second-stage running water are returned to the first-stage running water through the bypass, so that the data correlation between front and rear round of calculation is eliminated, the continuous round of calculation of the SM3 algorithm can be carried out in a pipelining manner, and the throughput rate of completing one round of calculation in one beat can be achieved in the round of calculation of other rounds of calculation, and the processing speed of the SM3 algorithm is further improved, and the calculation performance of the SM3 algorithm is improved.

Further, the embodiment of the application also provides a chip, and the chip can comprise the processor provided by the embodiment of the application.

Further, the embodiment of the application also provides an electronic device, such as a terminal device or a server device, which may include the processor provided in the embodiment of the application.

The foregoing describes a number of embodiments provided by embodiments of the present application, and the various alternatives presented by the various embodiments may be combined, cross-referenced, with each other without conflict, extending beyond what is possible, all of which may be considered embodiments disclosed and disclosed by embodiments of the present application.

Although the embodiments of the present application are disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention shall be defined by the appended claims.

Claims

1. A method for processing an SM3 algorithm, comprising:

storing the status word of the next state in a first destination operand of a first destination operand register, wherein the bit width of the first destination operand is the same as the bit width of a round calculation result;

returning the state word of the next state generated by the calculation of the current wheel from the second-stage flowing water to the first-stage flowing water through a bypass mechanism;

if the next round of calculation of the current round is carried out, in the first-stage flowing water, selecting a returned state word of the next state as the state word used for the next round of calculation; if the current wheel calculation is performed, in the first stage of the pipeline, a state word of the previous state is selected and used as the state word used for the current wheel calculation.

2. The method of claim 1, wherein the current round of computation is a j-th round of computation, wherein the first source operand comprises a plurality of message words and message parameters comprising:

the first source operand includes a message word W _j ，W _j+1 ，W _j+2 ，W _j+3 Message parameter W _j ’，W _j+1 ’，W _j+2 ’，W _j+3 ' wherein j represents the current number of wheels calculated;

The reading the message word and the message parameter corresponding to the current round of calculation from the first source operand of the first source operand register includes:

reading the corresponding message word W of the jth round of calculation from the first source operand _j And message parameter W _j ' where j represents the current number of wheels calculated.

3. The method of claim 2, wherein the status word comprises a, B, C, D, E, F, G, H; the status word of the next status calculated by the jth round of calculation comprises: a is that _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 The method comprises the steps of carrying out a first treatment on the surface of the The j-th round of calculating the required state word of the last state comprises: a is that _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i 。

4. A method according to claim 3, wherein said calculating the status word of the next state using the status word of the previous state and the current round to calculate the corresponding message word and message parameter according to a single round calculation instruction comprises:

according to the single round calculation instruction, the state word A of the last state is utilized _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i Message word W _j And message parameter W _j ' calculate the status word A of the next status _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 ；

The depositing the status word of the next state in the first destination operand of the first destination operand register includes:

state word a of the next state _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 Stored in a first destination operand of a first destination operand register.

5. The method as recited in claim 1, further comprising:

reading a message word for message expansion from a second source operand of the second source operand register and a third source operand of the third source operand register; the bit widths of the second source operand and the third source operand are the same as the bit width of the round calculation result, the second source operand comprises a first number of message words with the largest obtained sequence number, and the third source operand comprises a first number of message words before the second source operand;

according to a single message expansion instruction, expanding a first number of message words after generating a second source operand by using the read message words;

storing the message word generated by the expansion in a second destination operand of a second destination operand register; wherein the bit width of the second destination operand is the same as the bit width of the round calculation result.

6. The method of claim 5, wherein the second source operand comprises a first number of message words having a highest sequence number obtained comprises: the second source operand comprises 8 message words with the largest obtained sequence numbers;

The third source operand comprising a first number of message words preceding the second source operand comprising: the third source operand includes 8 message words preceding the second source operand;

the expanding the first number of message words after generating the second source operand with the read message words according to a single message expansion instruction includes:

according to the message expansion instruction, 8 message words after the second source operand is generated are expanded by using the read message words.

7. The method as recited in claim 4, further comprising:

the intermediate state words As and Ac obtained in the second-stage flowing water are returned to the first-stage flowing water through a bypass; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a;

if the next round of calculation of the j-th round is carried out, in the first-stage flowing water, selecting a cyclic left shift operation result of the intermediate state words As and Ac for the next round of calculation; if the current round calculation is performed, in the first-stage pipelining, selecting a state word A _i And 0 for the current round of calculation.

8. The method of claim 7, wherein the single round calculation instruction uses the state word a of the last state _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i Message word W _j And message parameter W _j ' calculate the status word A of the next status _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 Comprising the following steps:

according to the single round calculation instruction, calculating at the current round, and obtaining a state word generated by the calculation of the current round based on the selected state word, the corresponding message word and the message parameter;

the method further comprises the steps of:

and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.

9. The method of claim 8, wherein the obtaining the status word generated by the current round calculation based on the selected status word, the corresponding message word, and the message parameter at the current round calculation based on the single round calculation instruction comprises:

for the j-th round of calculation, in the first stage of pipelining, based on a left assignment operator, the selected A _i 、C _i 、E _i And G _i Respectively processing to obtain B in the second stage of flowing water _i+1 、D _i+1 、F _i+1 And H _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected B _i Performing a cyclic left shift operation to obtain C in the second stage pipeline _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected F _i Performing a cyclic left shift operation to obtain G in the second stage of pipeline _i+1 ；

And A is selected according to the first stage of flowing water _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' in the second stage of the pipeline, a state word A is generated _i+1 ，E _i+1 。

10. The method according to claim 9, characterized in thatA is selected according to the first stage flowing water _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' in the second stage of the pipeline, a state word A is generated _i+1 ，E _i+1 Comprising the following steps:

in the first stage, according to the selected state word A _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' determining intermediate data for calculating intermediate variables TT1 and TT 2;

in the second stage of the pipeline, based on the intermediate data, an intermediate variable TT1 is calculated and a state word A is determined based on the intermediate variable TT1 _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the second stage of the pipeline, calculating an intermediate variable TT2 from the intermediate data, and determining a status word E from the intermediate variable TT2 _i+1 。

11. The method of claim 10, wherein the pipelining is performed at a first stage based on a selected state word a _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' determining intermediate data for calculating intermediate variables TT1 and TT2 includes:

in the first stage of pipelining, the selected state word A _i 、B _i And C _i Carry out FF _j Calculating; according to FF _j Operation result, selected status word D _i Message parameter W _j ' performing a first CSA operation, wherein CSA is a carry save adder;

and according to the selected A _i Cyclic left shift operation result of (2), 0, and T _j Performing a second CSA operation on the cyclic left shift operation result, performing a first addition operation on the second CSA operation result, and performing a cyclic left shift operation on the first addition operation result to obtain an intermediate variable S1;

and, willSelected status word E _i Performing cyclic left shift operation to obtain an intermediate variable S2; for selected status word G _i 、F _i And E is _i GG is performed _j Calculation according to GG _j Operation result, selected status word H _i Message word W _j Performing a third CSA operation;

and performing a fourth CSA operation on the third CSA operation result, the intermediate variables S1 and S2 to obtain a fourth CSA operation result; and performing a second addition operation on the intermediate variables S1 and S2 to obtain an intermediate variable SS1.

12. The method of claim 11, wherein at the second stage the pipeline calculates an intermediate variable TT1 from the intermediate data and determines the state word a from the intermediate variable TT1 _i+1 Comprising the following steps:

intermediate variable SS1, status word A _i Performing exclusive or operation on the cyclic left shift operation result to obtain an intermediate variable SS2; according to the intermediate variable SS2 and the first CSA operation result, performing fifth CSA operation to obtain intermediate state words As and Ac; performing fourth addition operation on the intermediate state words As and Ac to obtain an intermediate variable TT1; based on the left assignment operator, the intermediate variable TT1 is processed to obtain a state word A _i+1 ；

The second stage of the pipeline calculates an intermediate variable TT2 according to the intermediate data and determines a status word E according to the intermediate variable TT2 _i+1 Comprising the following steps:

in the second-stage running water, performing third addition operation on the fourth CSA operation result to obtain an intermediate variable TT2; passing the intermediate variable TT2 through P ₀ Function, get state word E _i+1 。

13. A processor, comprising: a round calculation unit and an operand register; the operand register includes: a first source operand register and a first destination operand register; wherein the first source operand register sets a first source operand, the first source operand including a plurality of message words and message parameters, and the first source operand having a bit width that is the same as a bit width of a round calculation result; the first destination operand register is used for setting a first destination operand, and the bit width of the first destination operand is the same as that of a round calculation result;

the processor is configured with a single wheel calculation instruction; the round calculation unit is used for reading a message word and a message parameter corresponding to the current round calculation from a first source operand of the first source operand register; according to a single round calculation instruction, calculating a corresponding message word and a message parameter by using a state word of the previous state and the current round, and calculating a state word of the next state; storing a status word of the next state in a first destination operand of a first destination operand register; returning the state word of the next state generated by the calculation of the current wheel from the second-stage flowing water to the first-stage flowing water through a bypass mechanism; if the next round of calculation of the current round is carried out, in the first-stage flowing water, selecting a returned state word of the next state as the state word used for the next round of calculation; if the current wheel calculation is performed, in the first stage of the pipeline, a state word of the previous state is selected and used as the state word used for the current wheel calculation.

14. The processor of claim 13, wherein the current round of computation is a j-th round of computation, wherein the first source operand comprises a plurality of message words and message parameters comprising:

the round calculation unit is configured to read, from a first source operand in a first source operand register, a message word and a message parameter corresponding to a current round calculation, where the reading includes:

15. The processor of claim 14, wherein the status word comprises a, B, C, D, E, F, G, H; the status word of the next status calculated by the jth round of calculation comprises: a is that _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 The method comprises the steps of carrying out a first treatment on the surface of the The j-th round of calculating the required state word of the last state comprises: a is that _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i ；

The round calculation unit is configured to calculate, according to a single round calculation instruction, a corresponding message word and a message parameter by using a state word of a previous state and a current round, where calculating a state word of a next state includes:

According to the single round calculation instruction, the state word A of the last state is utilized _i ，B _i ，C _i ，D _i ，E _i ，F _i ，G _i ，H _i Message word W _j And message parameter W _j ' calculate the status word A of the next status _i+1 ，B _i+1 ，C _i+1 ，D _i+1 ，E _i+1 ，F _i+1 ，G _i+1 ，H _i+1 。

16. The processor of claim 13, wherein the processor further comprises: a message expansion unit; the operand register further includes: a second source operand register, a third source operand register, and a second destination operand register;

wherein the second source operand register sets a second source operand and the third source operand register sets a third source operand; the bit widths of the second source operand and the third source operand are the same as the bit width of the round calculation result, the second source operand comprises a first number of message words with the largest obtained sequence number, and the third source operand comprises a first number of message words before the second source operand;

the processor is configured with a single message expansion instruction; the message expansion unit is used for reading a message word used for message expansion from a second source operand of the second source operand register and a third source operand of the third source operand register; according to a single message expansion instruction, expanding a first number of message words after generating a second source operand by using the read message words; and storing the message word generated by the expansion in a second destination operand of the second destination operand register.

17. The processor of claim 15, wherein the wheel calculation unit comprises:

the bypass unit is used for returning the state word of the next state generated by the calculation of the current wheel from the second-stage running water to the first-stage running water;

the multi-group selector is used for selecting a returned state word of the next state in the first-stage flowing water as the state word used for the next round of calculation if the next round of calculation of the current round is performed; if the current wheel calculation is carried out, in the first-stage pipelining, selecting a state word of the previous state as the state word used for the current wheel calculation;

the calculation logic is used for calculating at the current round according to the single round calculation instruction, and obtaining a state word generated by the current round calculation based on the state word, the corresponding message word and the message parameter selected by the multiple groups of selectors; and according to the single round calculation instruction, calculating in the next round of the current round, and obtaining the state word generated in the next round of calculation based on the selected state word, the corresponding message word and the message parameter.

18. The processor of claim 17, wherein the bypass unit is further configured to return intermediate state words As and Ac obtained in the second stage pipeline to the first stage pipeline; intermediate state words As and Ac are used to directly generate a round of calculation results of state word a;

The multiple groups of selectors are also used for calculating the next round of the j-th round, and then the first-level stream is formedIn water, selecting a cyclic left shift operation result of the intermediate state words As and Ac for the next round of calculation; if the current round calculation is performed, in the first-stage pipelining, selecting a state word A _i And 0 for the current round of calculation.

19. The processor of claim 18, wherein the plurality of sets of selectors comprises:

a first group of selectors including multiple selectors for selecting the returned state word H in the first stage of the pipeline during the next round of calculation of the j-th round _i+1 、G _i+1 、F _i+1 And E is _i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word H is selected when the calculation of the front wheel is performed _i 、G _i 、F _i And E is _i As a state word used for calculating the current wheel;

a second group of selectors including multiple selectors for selecting the returned state word A in the first stage of the pipeline when the next round of calculation of the j-th round is performed _i+1 、B _i+1 、C _i+1 And D _i+1 Calculating the used status word as the next round of calculation; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out _i 、B _i 、C _i And D _i As a state word used for calculating the current wheel;

a third group of selectors, including a plurality of selectors, which select the cyclic left shift operation result of the returned intermediate state words As and Ac in the first stage of flowing water for the next round of calculation when the next round of calculation of the j-th round is performed; in the first stage of the pipeline, a state word A is selected when the calculation of the front wheel is carried out _i And 0 for the current round of calculation.

20. The processor of claim 19, wherein the calculation logic to obtain the status word generated by the current round calculation based on the status word selected by the plurality of sets of selectors, the corresponding message word, and the message parameter at the current round calculation based on the single round calculation instruction comprises:

for the j-th round of calculation, in the first stage of pipelining, based on a left assignment operator, the selected A _i 、C _i 、E _i And G _i Respectively processing, and latching the processing result in the second stage pipeline to obtain B _i+1 、D _i+1 、F _i+1 And H _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected B _i Performing a cyclic left shift operation to obtain C in the second stage pipeline _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in the first stage of running water, for the selected F _i Performing a cyclic left shift operation to obtain G in the second stage of pipeline _i+1 ；

21. The processor of claim 20, wherein the computation logic is to select a based on a first stage pipeline _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' in the second stage of the pipeline, a state word A is generated _i+1 ，E _i+1 Comprising the following steps:

in the first stage, according to the selected state word A _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' determining intermediate data for calculating intermediate variables TT1 and TT 2; in the second stage of the pipeline, based on the intermediate data, an intermediate variable TT1 is calculated and a state word A is determined based on the intermediate variable TT1 _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And, in a second stage of the pipeline, calculating an intermediate from the intermediate dataVariable TT2, and determining the status word E from the intermediate variable TT2 _i+1 。

22. The processor of claim 21, wherein the computation logic is to pipeline, at a first stage, based on a selected state word a _i 、B _i 、C _i 、D _i 、E _i 、F _i 、G _i 、H _i And a message word W _j And message parameter W _j ' determining intermediate data for calculating intermediate variables TT1 and TT2 includes:

in the first stage of the pipeline, the state word A selected by the second group of selectors is processed _i 、B _i And C _i Carry out FF _j Calculating; FF is put into _j Operation result, state word D selected by the second group selector _i Message parameter W _j ' send into first 3 input 2 output CSA unit, carry on first CSA operation, CSA is carry save adder;

and A selected by the third group selector _i Cyclic left shift operation result of (2), 0, and T _j The cyclic left shift operation result of (2) is sent to a second 3-input 2-output CSA unit to carry out second CSA operation; the second CSA operation result is input into a 32-bit first addition unit to carry out first addition operation; performing cyclic left shift operation on the first addition operation result to obtain an intermediate variable S1;

and, selecting the state word E by the first group selector _i Performing cyclic left shift operation to obtain an intermediate variable S2; the state word G selected by the first group of selectors _i 、F _i And E is _i GG is performed _j Calculating; will GG _j Calculation result, message word W _j A first group of selector selected status words H _i Sending the third CSA unit with 3 input and 2 output to perform a third CSA operation; and, send the intermediate variable S1, S2 and third CSA operation result into the first 4 input 2 output CSA unit, carry on the fourth CSA operation;

and feeding the intermediate variables S1 and S2 into a 32-bit second addition unit for performing a second addition operation to obtain an intermediate variable SS1.

23. The processor of claim 22, wherein the computation logic is to pipeline at a second stage, compute an intermediate variable TT1 from the intermediate data, and determine a state word a from the intermediate variable TT1 _i+1 Comprising the following steps:

in the second stage of the pipeline, the intermediate variable SS1 and the state word A selected by the second group of selectors are processed _i Performing exclusive or operation on the cyclic left shift operation result to obtain an intermediate variable SS2; the intermediate variable SS2 and the first CSA operation result are sent to a fourth CSA unit with 3 input and 2 output to perform fifth CSA operation so As to obtain intermediate state words As and Ac; sending the intermediate state words As and Ac to a fourth adding unit with 32 bits, and performing fourth adding operation to obtain an intermediate variable TT1; based on the left assignment operator, the intermediate variable TT1 is processed to obtain a state word A _i+1 ；

The calculating logic is used for flowing water in the second stage, calculating an intermediate variable TT2 according to the intermediate data, and determining a status word E according to the intermediate variable TT2 _i+1 Comprising the following steps:

sending the fourth CSA operation result into a 32-bit third addition unit for third addition operation to obtain an intermediate variable TT2; passing the intermediate variable TT2 through P ₀ Function, get state word E _i+1 。

24. A chip comprising a processor as claimed in any one of claims 13 to 23.

25. An electronic device comprising the chip of claim 24.