CN115658148A - Acceleration method of SM4 block cipher algorithm and instruction set processor - Google Patents

Acceleration method of SM4 block cipher algorithm and instruction set processor Download PDF

Info

Publication number
CN115658148A
CN115658148A CN202211280193.1A CN202211280193A CN115658148A CN 115658148 A CN115658148 A CN 115658148A CN 202211280193 A CN202211280193 A CN 202211280193A CN 115658148 A CN115658148 A CN 115658148A
Authority
CN
China
Prior art keywords
round
instruction
algorithm
block cipher
bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211280193.1A
Other languages
Chinese (zh)
Inventor
何军
陈子钰
姜军
尹飞
蒋生健
李媛
范好好
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI HIGH-PERFORMANCE INTEGRATED CIRCUIT DESIGN CENTER
Original Assignee
SHANGHAI HIGH-PERFORMANCE INTEGRATED CIRCUIT DESIGN CENTER
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI HIGH-PERFORMANCE INTEGRATED CIRCUIT DESIGN CENTER filed Critical SHANGHAI HIGH-PERFORMANCE INTEGRATED CIRCUIT DESIGN CENTER
Priority to CN202211280193.1A priority Critical patent/CN115658148A/en
Publication of CN115658148A publication Critical patent/CN115658148A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Storage Device Security (AREA)

Abstract

The invention relates to an accelerating method and an instruction set processor of an SM4 block cipher algorithm, wherein the accelerating method is based on an SM4 expanding instruction set, and adopts a parallel pipeline and an instruction level parallel technology to accelerate the realization of the SM4 block cipher algorithm, and comprises an SM4key expanding algorithm and an SM4 encryption and decryption algorithm; the SM4 expansion instruction set adopts a RISC architecture and comprises an SM4 round key generation instruction and an SM4 round function iteration instruction; the SM4 round key generation instruction adopts a plurality of SM4 round key parallel generation algorithms to accelerate the SM4key expansion algorithm, and the SM4 round function iteration instruction adopts a plurality of rounds of SM4 iteration parallel execution algorithms to accelerate the SM4 encryption and decryption algorithm. The instruction set processor supports parallel pipeline execution of an SM4 round key generation instruction and an SM4 round function iteration instruction, and the delay is 9 beats. The invention can greatly improve the speed of executing the SM4 block cipher algorithm and simplify the software program.

Description

Acceleration method of SM4 block cipher algorithm and instruction set processor
Technical Field
The invention relates to the technical field of processor design and information security, in particular to an accelerating method of an SM4 block cipher algorithm and an instruction set processor.
Background
The cryptographic technology is an important guarantee of information security, and countries in the world do not pay attention to the research and implementation technology of the cryptographic algorithm and put forward respective standard cryptographic algorithm systems in sequence. The national password algorithm is a series of password algorithms and specifications thereof which are independently developed by the China national password administration for ensuring the security of commercial passwords in China. The SM4 block cipher algorithm is a block cipher algorithm issued by the national cipher administration in 2012, 3, month and 21, is an important component in the standard cipher algorithm system in China, and has important significance for accelerating the speed of the processor for executing the SM4 block cipher algorithm. With the implementation of domestic cryptography, network security, and the like, SM4 block cipher algorithms are becoming popular, and how to execute SM4 block cipher algorithms more efficiently has become a hotspot of research.
The SM4 block cipher algorithm is a typical block algorithm, the algorithm mainly comprises an SM4key expansion algorithm and an SM4 encryption and decryption algorithm, the lengths of plaintext, ciphertext blocks and keys are 128 bits, and the encryption algorithm, the decryption algorithm and the key expansion algorithm all adopt 32-round nonlinear iteration structures. The algorithm structures of data decryption and data encryption are the same, and both the algorithm structures comprise 32 same rounds of nonlinear round function iterations and one inverse sequence transformation R, except that the use sequence of round keys is opposite, and a decryption round key is the inverse sequence of an encryption round key.
The system parameters of the SM4 block cipher algorithm may be denoted as FK- (FK) 0 ,FK 1 ,FK 2 ,FK 3 ) The algorithm fixed parameter can be expressed as CK- (CK) 0 ,CK 1 ,…,CK 31 ) Wherein FK i (i-0,1,2,…,31),CK i (i-0, 1,2, \ 8230;, 31) is a 32-bit word,for use in a key expansion algorithm. The encryption key may be expressed as MK- (MK) 0 ,MK 1 ,MK 2 ,MK 3 ) In which MK i (i-0, 1,2, 3) are words of length 32 bits. Based on a key expansion algorithm, the encryption key may generate a round key, which is denoted as (rk) 0 ,rk 1 ,…,rk 31 ) Wherein, rk i (i-0, 1,2, \ 8230;, 31) is a 32-bit word.
Let the input be (X) 0 ,X 1 ,X 2 ,X 3 ) And the round key is rk, then the round function of the SM4 block cipher algorithm is: f (X) 0 ,X 1 ,X 2 ,X 3 ,rk)-X 0 ⊕T(X 1 ⊕X 2 ⊕X 3 ∈ rk), where the synthesis permutation T: z 32 2 →Z 32 2 The method is reversible transformation and is compounded by nonlinear transformation tau and linear transformation L, namely: t (-) -L (. Tau.)). The nonlinear transformation τ is made up of four parallel S-boxes. Let the input be A- (a) 0 ,a 1 ,a 2 ,a 3 ) The output is B- (B) 0 ,b 1 ,b 2 ,b 3 ) Then: (b) 0 ,b 1 ,b 2 ,b 3 )-τ(A)-(Sbox(a 0 ),Sbox(a 1 ),Sbox(a 2 ),Sbox(a 3 ) Wherein data for Sbox may be obtained from a look-up table. The output of the nonlinear transformation τ is the input of the linear transformation L, given as B ∈ Z 32 2 The output is C ∈ Z 32 2 Then: C-L (B) -B ≧ B +<<<2)⊕(B<<<10)⊕(B<<<18)⊕(B<<<24)。
The encryption algorithm of the SM4 block cipher algorithm consists of 32 iterative operations and 1 reverse order transformation R; let the plaintext input be (X) 0 ,X 1 ,X 2 ,X 3 )∈(Z 32 2 ) 4 The ciphertext output is (Y) 0 ,Y 1 ,Y 2 ,Y 3 )∈(Z 32 2 ) 4 The round key is rk i ∈Z 32 2 (i-0, 1,2, \8230;, 31). The iterative process of the encryption algorithm is as follows:
(1) 32 iterative operations: x i+4 -F(X i+0 ,X i+1 ,X i+2 ,X i+3 ),i-0,1,2,…,31;
(2) And (3) reverse order transformation: (Y) 0 ,Y 1 ,Y 2 ,Y 3 )-R(X 32 ,X 33 ,X 34 ,X 35 )-(X 35 ,X 34 ,X 33 ,X 32 )。
The decryption transformation of the SM4 block cipher algorithm is the same structure as the encryption transformation, except for the order of use of the round keys. When decrypting, the round key used is (rk) 31 ,rk 30 ,…,rk 0 )。
The key of the SM4 block cipher algorithm is generated from the encryption key by a key expansion algorithm.
Encryption key MK- (MK) 0 ,MK 1 ,MK 2 ,MK 3 )∈(Z 32 2 ) 4 The round key generation method comprises the following steps:
(K 0 ,K 1 ,K 2 ,K 3 )-(MK 0 ⊕FK 0 ,MK 1 ⊕FK 1 ,MK 2 ⊕FK 2 ,MK 3 ⊕FK 3 ),
rk i -K i+4 -K i ⊕T’(K i+1 ⊕K i+2 ⊕K i+3 ⊕Ck i ) I-0,1,2, \ 8230;, 31; wherein:
(1) T 'is the linear transformation L replacing the synthetic permutation T with L':
L’(B)-B⊕(B<<<13)⊕(B<<<23)
(2) The value taking method of the system parameter FK comprises the following steps:
FK 0 -(A3BiBAC6),FK 1 -(56AA3350),FK 2 -(677D9197),FK 3 -(B27022DC)。
(3) The value taking method of the algorithm fixed parameter CK comprises the following steps:
CK i (i-0, 1,2, \ 8230;, 31) is an algorithm fixed parameter, and ck is set i,j Is CK i To (1) a j Bytes (i-0, 1,2, \ 8230;, 31, j-0,1,2, 3), i.e., CK i -(ck i,0 ,ck i,1 ,ck i,2 ,ck i,3 )∈(Z 32 2 ) 4 Then ck i,j -(4i+j)×7(mod 256)。
The encryption algorithm, the decryption algorithm and the key expansion algorithm of the SM4 block cipher algorithm all need to occupy a large amount of computing resources, so that a special acceleration implementation technology is needed. The current methods for accelerating the implementation of the SM4 block cipher algorithm can be divided into two types, software implementation and hardware implementation. The software implementation can be divided into the technologies of AESNI instruction set acceleration, bitslice, SM4 parallel computation based on SIMD instructions and the like, and has the characteristics of limited optimization space and application range and security threats such as side channel attack and the like; the hardware implementation optimizes and realizes the efficiency of the SM4 block cipher algorithm through key technologies such as a composite domain technology, and the special hardware such as an FPGA, an ASIC and a GPU is adopted to accelerate the SM4 block cipher algorithm, so that the advantage of the hardware implementation is high acceleration efficiency, but the cost is high, and the universality and the expandability are poor. If the SM4 block cipher algorithm can be accelerated by adopting an Instruction Set Architecture (ISA) expansion mode, the execution of the SM4 block cipher algorithm can be accelerated, the expandability and the design flexibility are realized, and the performance of the general processor for executing the SM4 block cipher algorithm can be effectively improved.
At present, how to improve the performance of executing the SM4 block cipher algorithm by the domestic processor adopting the autonomous instruction set is not effectively solved, so that a method for accelerating the SM4 block cipher algorithm facing the domestic processor is urgently needed to be explored, and the speed of executing the SM4 block cipher algorithm by the domestic processor is improved.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an acceleration method of an SM4 block cipher algorithm and an instruction set processor, which can greatly improve the speed of executing the SM4 block cipher algorithm and simplify software programs.
The technical scheme adopted by the invention for solving the technical problems is as follows: an SM4 block cipher algorithm acceleration method is provided, based on an SM4 expansion instruction set, a parallel pipeline and an instruction level parallel technology are adopted to realize the SM4 block cipher algorithm in an acceleration mode, and the SM4 block cipher algorithm comprises an SM4key expansion algorithm and an SM4 encryption and decryption algorithm; the SM4 extended instruction set adopts RISC architecture, and the instruction adopts fixed length32-bit format, source operand and destination operand are 256-bits; the SM4 expansion instruction set comprises an SM4 round key generation instruction and an SM4 round function iteration instruction; the SM4 round key generation instruction adopts a plurality of SM4 round key parallel generation algorithms to accelerate the SM4key expansion algorithm, and the SM4 round key parallel generation algorithm uses the previous 4 32-bit intermediate keys K 3 ~K 0 And 8 32-bit algorithm fixed parameters CK associated with the subsequent 8 round keys 7 ~CK 0 As input, in the SM4 round key expansion process, 8 round key generation can be completed by executing one time; the SM4 round function iteration instruction adopts a multi-round SM4 iteration parallel execution algorithm to accelerate the SM4 encryption and decryption algorithm, and the SM4 iteration parallel execution algorithm adopts two groups of irrelevant current 4 intermediate words W 3 ~W 0 、W’ 3 ~W 0 8 round keys rk used in subsequent 8 rounds of operation 7 ~rk 0 For input, during the encryption/decryption process of the SM4 block cipher algorithm, two sets of uncorrelated 4 intermediate words are generated after 8 rounds of performing the SM4 encryption/decryption round function.
When the parallel pipeline and the instruction level parallel technology are adopted to accelerate the realization of the SM4 block cipher algorithm:
the encryption method comprises the following steps:
(A) Round key iterative initial value (K) generation using general instructions in general purpose processors 3 ,K 2 ,K 1 ,K 0 );
(B) Iterating the initial value (K) with the round key 3 ,K 2 ,K 1 ,K 0 ) And a system fixed parameter CK 7 ~CK 0 Executing 1 st SM4 round key generation instruction for input to generate 8 round keys rk of SM4 block cipher algorithm 7 ~rk 0
(C) With the round key rk 7 ~rk 4 And a system fixed parameter CK 15 ~CK 8 Executing 2 nd SM4 round key generation instruction for input to generate 8 round keys rk of SM4 block cipher algorithm 15 ~rk 8 (ii) a Simultaneously with the round key rk 7 ~rk 0 Plaintext (W) unrelated to two sets of data 3 ,W 2 ,W 1 ,W 0 ,W’ 3 ,W’ 2 ,W’ 1 ,W’ 0 ) For inputting, executing 1 st SM4 round function iteration instruction, completing 1-8 round function iterations of SM4 encryption algorithm, and obtaining iteration work word (W) (7) 3 ,W (7) 2 ,W (7) 1 ,W (7) 0 ,W’ (7) 3 ,W’ (7) 2 ,W’ (7) 1 ,W’ (7) 0 );
(D) With the round key rk 15 ~rk 12 And a system fixed parameter CK 23 ~CK 16 Executing 3 rd SM4 round key generation instruction for input to generate 8 round keys rk of SM4 block cipher algorithm 23 ~rk 16 (ii) a Simultaneously with the round key rk 15 ~rk 8 And an iterative work word (W) (7) 3 ,W (7) 2 ,W (7) 1 ,W (7) 0 ,W’ (7) 3 ,W’ (7) 2 ,W’ (7) 1 ,W’ (7) 0 ) For inputting, executing 2 nd SM4 round function iteration instruction, completing SM4 encryption algorithm 9-16 round function iterations, and obtaining iteration work word (W) (15) 3 ,W (15) 2 ,W (15) 1 ,W (15) 0 ,W’ (15) 3 ,W’ (15) 2 ,W’ (15) 1 ,W’ (15) 0 );
(E) With the round key rk 23 ~rk 20 And a system fixed parameter CK 31 ~CK 24 Executing the 4 th SM4 round key generation instruction for input to generate 8 round keys rk of an SM4 block cipher algorithm 31 ~rk 24 (ii) a Simultaneously with the round key rk 23 ~rk 16 And an iterative working word (W) (15) 3 ,W (15) 2 ,W (15) 1 ,W (15) 0 ,W’ (15) 3 ,W’ (15) 2 ,W’ (15) 1 ,W’ (15) 0 ) To input, execute the 3 rd SM4 round functionAn iteration instruction (VSM 4R) completes 17-24 times of round function iteration of the SM4 encryption algorithm to obtain an iteration work word (W) (23) 3 ,W (23) 2 ,W (23) 1 ,W (23) 0 ,W’ (23) 3 ,W’ (23) 2 ,W’ (23) 1 ,W’ (23) 0 );
(F) With the round key rk 31 ~rk 24 And an iterative work word (W) (23) 3 ,W (23) 2 ,W (23) 1 ,W (23) 0 ,W’ (23) 3 ,W’ (23) 2 ,W’ (23) 1 ,W’ (23) 0 ) Executing 4 th SM4 round function iteration instruction for input, completing SM4 encryption algorithm 25-32 round function iterations, and obtaining an iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 );
(G) The iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 ) Outputting in reverse order to obtain the execution result ciphertext (Y) of the encryption algorithm 3 ,Y 2 ,Y 1 ,Y 0 ,Y’ 3 ,Y’ 2 ,Y’ 1 ,Y’ 0 );
The decryption method comprises the following steps:
(a) Round key iterative initial value (K) generation using general instructions in general purpose processors 3 ,K 2 ,K 1 ,K 0 );
(b) Iterating the initial value (K) with the round key 3 ,K 2 ,K 1 ,K 0 ) And a system fixed parameter CK 31 ~CK 0 Sequentially executing an SM4 round key generation instruction for 4 times to generate an SM4 block cipher for source operation data32 round keys rk of the algorithm 31 ~rk 0
(c) With the round key rk 31 ~rk 0 And ciphertext (Y) 3 ,Y 2 ,Y 1 ,Y 0 ,Y’ 3 ,Y’ 2 ,Y’ 1 ,Y’ 0 ) Sequentially executing an SM4 round function iteration instruction for 4 times to complete 32 round function iterations of an SM4 decryption algorithm to obtain an iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 );
(d) The iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 ) Outputting in reverse order to obtain the execution result plaintext (W) of the decryption algorithm 3 ,W 2 ,W 1 ,W 0 ,W’ 3 ,W’ 2 ,W’ 1 ,W’ 0 )。
The SM4 round KEY generation instruction adopts a simple operation instruction format in an immediate format, specifically VSM4KEY Va., # b, vc, and is configured to instruct 1 operand in a 256-bit source register Va and an 8-bit immediate operand to perform an operation, and a result is stored in a 256-bit destination register Vc, where [ 31.
The SM4 round key generation instruction specifically includes: according to the previous 4 intermediate keys K 3 ~K 0 Generating SM4 group cipherSubsequent 8 round keys rk of the code algorithm 7 ~rk 0 (ii) a Wherein the intermediate key K 3 ~K 0 Stored in the high 128 bits of the source register Va, algorithm fixed parameter CK 7 ~CK 0 Determined by immediate # b, the resulting result rk 7 ~rk 0 Stored in a target register Vc; results rk 7 ~rk 0 Are respectively equal to the intermediate key K 11 ~K 4 For i equal to 0,1,2,3,4,5,6,7, K i+4 The generation logic of (a) is: k i+4 =K i XOR Temp2 XOR(Temp2<<<13)XOR(Temp2<<<23 Where XOR represents bitwise XOR,<<<Representing the loop left shift, temp2 is a 32-bit intermediate variable word, and the generation logic of Temp2 is: temp2[31]SBOX (Temp 1), the SBOX (X) function is used to look up a table according to 4 bytes of X in parallel to obtain a new 32-bit data, temp1 is a 32-bit intermediate variable word, and the generation logic of Temp1 is: temp1[31]=K i+1 XOR K i+2 XOR K i+3 XOR CK i ,CK i Fixed parameter for a 32-bit algorithm, { CK 7 ~CK 0 = SELCK (# b), where the SELCK (# b) function is used to determine an algorithm fixed parameter CK in a key expansion process of an SM4 block cipher algorithm i Can determine 8 32-bit data CK from # b according to SM4 block cipher algorithm 7 ~CK 0 The specific value of (a); executing the SM4 round key generation instruction once can generate 8 round keys of an SM4 block cipher algorithm, sequentially executing the SM4 round key generation instruction 4 times, updating the high 128 bits of the source register Va with the high 128 bits of the generated target register Vc each time, increasing the immediate number # b by 1, and generating 32 round keys rk of the SM4 block cipher algorithm 31 ~rk 0
The SM4 round function iteration instruction adopts a simple operation instruction format in a register format, specifically VSM4R Va, vb, vc, and is configured to instruct two operands in two 256-bit source registers Va and Vb to perform operation, and a result is saved in one 256-bit destination register Vc, where [ 31.
The SM4 round function iteration instruction specifically comprises the following steps: according to the current 4 intermediate words W 3 ~W 0 And 8 round keys rk for subsequent 8 round operations 7 ~rk 0 Generating 4 intermediate words W after 8 iterations of SM4 block cipher algorithm 11W 8 2 128-bit operations are performed in parallel, where W 3 ~W 0 Stored in the high or low 128 bits of the source register Va, rk 7 ~rk 0 Stored in a source register Vb, and generates a result W 11 ~W 8 Stored in the corresponding high order or low order of the target register Vc; in 8 iterations, for j equal to 0,1,2,3,4,5,6,7, result W j+4 The generation logic of (1) is: w is a group of j+4 =W j XOR Temp2XOR(Temp2<<<2)XOR(Temp2<<<10)XOR(Temp2<<<18)XOR(Temp2<<<24 Where XOR represents bitwise XOR,<<<Representing the loop left shift, temp2 is a 32-bit intermediate variable word, and the generation logic of Temp2 is: temp2[31]The function SBOX (Temp 1) is used to perform table lookup operation in parallel according to 4 bytes of 32-bit data X to obtain a new 32-bit data, temp1 is a 32-bit intermediate variable word, and the generation logic of Temp1 is: temp1[31]=W j+1 XOR W j+2 XOR W j+ 3 XOR rk j (ii) a And executing the SM4 round function iteration instruction once to complete 8 iterations of the encryption and decryption round functions of the two groups of SM4 block cipher algorithms, sequentially executing the SM4 round function iteration instruction 4 times, updating the source register Vb by 8 new round keys each time, updating the data in the source register Va by the generated data in the target register Vc, and generating the final 4 words of the two groups of SM4 block cipher algorithms.
The technical scheme adopted by the invention for solving the technical problem is as follows: the SM4 round key generation instruction execution unit and the SM4 round function iteration instruction execution unit are arranged on different execution pipelines and respectively occupy different read and write ports of the register file;
the SM4 round key generation instruction execution unit has:
two inputs for inputting a 128-bit operand A and an 8-bit immediate operand B, respectively;
an output for outputting a 256-bit execution result;
the SM4 round key generation instruction execution unit directly realizes shift operation and system parameter processing by adopting hardware logic, and realizes SBOX operation and 8 round key pipeline parallel processing by adopting hardware table lookup; the SM4 round key generation instruction execution unit can execute the SM4 round key generation instruction in a pipeline manner;
the SM4 round function iteration instruction execution unit is provided with:
two inputs for inputting a 258-bit operand A and a 256-bit operand B, respectively;
an output terminal for outputting a 256-bit execution result
The SM4 round function iteration instruction execution unit directly realizes shift operation and processing system parameters by adopting hardware logic, realizes SBOX operation and 8 round functions in a pipeline parallel processing mode by adopting hardware table look-up, and can execute the SM4 round function iteration instruction in a pipeline mode.
The SM4 round key generation instruction execution unit is provided with an 8-stage iteration execution platform and a 1-stage output platform, and the total execution delay is 9 beats; the SM4 round function iteration instruction execution unit is provided with an 8-stage iteration execution platform and a 1-stage output platform, and the total execution delay is 9 beats; the instruction set processor supports parallel pipelined execution of the SM4 round key generation instruction and the SM4 round function iteration instruction.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects:
according to the invention, the SM4 round KEY generation instruction (VSM 4 KEY) of a plurality of SM4 round KEY parallel generation algorithms can realize parallel generation of a plurality of round KEYs, and the SM4 round function iteration instruction (VSM 4R) of a plurality of rounds of SM4 iteration parallel execution algorithms can realize multi-round iteration parallel execution of SM4 encryption and decryption round functions, so that the execution speed of the SM4 block cipher algorithm is well accelerated; the SM4 block cipher algorithm program is compiled by using the SM4 expansion instruction set, various functions of an SM4key expansion algorithm and an SM4 encryption and decryption algorithm in the SM4 block cipher algorithm can be completed, the software program is obviously simplified, the algorithm compiling is facilitated, and the storage overhead of the algorithm is reduced.
The parallel pipeline and the instruction level parallel technology are adopted in the invention, so that the parallel potential of the SM4 block cipher algorithm and the SM4 extended instruction set is fully realized, and the execution speed of the SM4 block cipher algorithm is remarkably accelerated.
In the invention, the execution delay of the VSM4KEY instruction and the VSM4R instruction is 9 beats, two pipelines supporting the parallel pipeline execution of the instructions are arranged, the shift operation in the algorithm is directly realized by adopting hardware logic, and the SBOX (x) function processing is realized by adopting the hardware logic to process the system parameters and the fixed parameters as well as the values of an SBOX box and hardware table lookup; by adopting the processor, the round key generation and encryption (decryption) iteration of 9 groups of irrelevant SM4 block cipher algorithms can be completed only by 54 (81) beats at the shortest time, and the execution speed of the SM4 block cipher algorithms is greatly improved.
The invention realizes the parallel potential of the key expansion algorithm and the encryption and decryption algorithm in the SM4 block cipher algorithm, has the advantages of easy transplantation and good expansibility, and is easy to integrate into the existing execution components of the general processor. Can be applied to RISC processors or special-purpose cryptographic chips to improve their performance in performing SM4 block cryptographic algorithms.
Drawings
FIG. 1 is a block diagram of the execution process of the SM4 extended instruction set;
FIG. 2 is a block diagram of an encryption algorithm implementation flow in a method of accelerating an SM4 block cipher algorithm;
FIG. 3 is a block diagram of a decryption algorithm implementation flow in a method of accelerating an SM4 block cipher algorithm;
figure 4 is a flow chart of a multiple SM4 round key parallel generation algorithm;
FIG. 5 is a flow chart of a multi-round SM4 iterative parallel execution algorithm;
FIG. 6 is a simple arithmetic instruction format in immediate format;
FIG. 7 is a simple operation instruction format of a register format;
FIG. 8 is a block diagram of a VSM4KEY instruction execution unit;
FIG. 9 is a block diagram of a VSM4R instruction execution unit;
FIG. 10 is a block diagram of an execution pipeline of a processor core or processor of one embodiment of the invention.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention can be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the claims appended to the present application.
The SM4 block cipher algorithm mainly comprises an SM4key expansion algorithm and an SM4 encryption and decryption algorithm which both comprise 32 rounds of nonlinear iterative operations, and the core of the algorithm lies in a round function of the key expansion algorithm and a round function of the encryption and decryption algorithm; therefore, the key to speed up the SM4 block cipher algorithm is to fully mine and realize the intrinsic parallelism of the round function of the key expansion algorithm and the round function of the encryption and decryption algorithm, and to realize the parallel execution of the round functions as much as possible.
The inventor of the invention finds that parallel potential exists in the round key generation of the SM4key expansion algorithm and the round iteration process of the SM4 encryption and decryption algorithm, and special instructions can be adopted to realize the parallelism of round functions, so that the parallel generation of a plurality of round keys and the multi-round iteration of the encryption and decryption round functions can be completed at one time. Parallel pipeline execution of an SM4 round KEY generation instruction (VSM 4 KEY) and an SM4 round function iteration instruction (VSM 4R) can be supported by a special circuit. In addition, a large number of shift and XOR operations exist in round functions of the SM4key expansion algorithm and the SM4 encryption algorithm, and the shift operation and the XOR operation can be directly realized by adopting hardware logic, so that the execution of the round functions can be effectively accelerated; the execution process of parameter processing and nonlinear transformation tau can be remarkably accelerated by adopting hardware logic to realize system parameters, fixed parameters and SBOX (x) function processing; multiple unrelated SM4 block cipher algorithms may be executed in parallel using a pipelined execution technique.
The embodiment of the invention relates to an accelerating method of SM4 block cipher algorithm, the method is based on SM4 extended instruction set, said SM4 extended instruction set adopts RISC framework, all instructions adopt 32 bit format of fixed length, source operand and result are 256 bits; as shown in fig. 1, the SM4 spreading instruction set includes an SM4 round KEY generation instruction (VSM 4 KEY) for accelerating an SM4KEY spreading algorithm and an SM4 round function iteration instruction (VSM 4R) for accelerating an SM4 encryption/decryption algorithm; the parallel pipeline and the instruction level parallel technology are adopted to accelerate the realization of the SM4 block cipher algorithm, the VSM4KEY instruction and the VSM4R instruction are executed in parallel in a pipeline, and a plurality of groups of SM4 block cipher algorithms with irrelevant data can be executed in parallel at different execution stations of the pipeline, so that the respective SM4 block cipher algorithms are realized, and the respective round KEY expansion and encryption and decryption are respectively completed.
When the parallel pipeline and the instruction level parallel technology are adopted to accelerate the realization of the SM4 block cipher algorithm, the specific process of the encryption algorithm in the method for accelerating the SM4 block cipher algorithm is shown in figure 2, and the method comprises the following steps:
1) Round key iterative initial value (K) generation using general instructions in general purpose processors 3 ,K 2 ,K 1 ,K 0 );
2) With iterative initial value (K) 3 ,K 2 ,K 1 ,K 0 ) And a system fixed parameter CK 7 ~CK 0 For the input, the 1 st SM4 round KEY generation instruction (VSM 4 KEY) is executed to generate 8 round KEYs rk of the SM4 block cipher algorithm 7 ~rk 0
3) With the secret key rk generated in step 2) 7 ~rk 4 And a system fixed parameter CK 15 ~CK 8 Execute 2 nd SM4 round KEY generation instruction (VSM 4 KEY) for input, generate 8 round KEYs rk of SM4 block cipher algorithm 15 ~rk 8 (ii) a The key rk generated simultaneously in step 2) 7 ~rk 0 Plaintext (W) unrelated to two sets of data 3 ,W 2 ,W 1 ,W 0 ,W’ 3 ,W’ 2 ,W’ 1 ,W’ 0 ) For inputting, executing 1 st SM4 round function iteration instruction (VSM 4R), completing 1-8 round function iterations of SM4 encryption algorithm, and obtaining iteration work word (W) (7) 3 ,W (7) 2 ,W (7) 1 ,W (7) 0 ,W’ (7) 3 ,W’ (7) 2 ,W’ (7) 1 ,W’ (7) 0 );
4) With the secret key rk generated in step 3) 15 ~rk 12 And a system fixed parameter CK 23 ~CK 16 Execute 3 rd SM4 round KEY generation instruction (VSM 4 KEY) for input, generate 8 round KEYs rk of SM4 block cipher algorithm 23 ~rk 16 (ii) a At the same time with the key rk generated in step 3) 15 ~rk 8 And an iterative working word (W) (7) 3 ,W (7) 2 ,W (7) 1 ,W (7) 0 ,W’ (7) 3 ,W’ (7) 2 ,W’ (7) 1 ,W’ (7) 0 ) Executing a 2 nd SM4 round function iteration instruction (VSM 4R) for input, finishing the SM4 encryption algorithm 9-16 round function iterations to obtain an iteration work word (W) (15) 3 ,W (15) 2 ,W (15) 1 ,W (15) 0 ,W’ (15) 3 ,W’ (15) 2 ,W’ (15) 1 ,W’ (15) 0 );
5) With the secret key rk generated in step 4) 23 ~rk 20 And a system fixed parameter CK 31 ~CK 24 Execute the 4 th SM4 round KEY generation instruction (VSM 4 KEY) for input, generate 8 round KEYs rk for the SM4 block cipher algorithm 31 ~rk 24 (ii) a At the same time with the key rk generated in step 4) 23 ~rk 16 And an iterative work word (W) (15) 3 ,W (15) 2 ,W (15) 1 ,W (15) 0 ,W’ (15) 3 ,W’ (15) 2 ,W’ (15) 1 ,W’ (15) 0 ) For inputting, executing a 3 rd SM4 round function iteration instruction (VSM 4R), completing 17-24 round function iterations of an SM4 encryption algorithm, and obtaining an iteration work word (W) (23) 3 ,W (23) 2 ,W (23) 1 ,W (23) 0 ,W’ (23) 3 ,W’ (23) 2 ,W’ (23) 1 ,W’ (23) 0 );
6) The secret key rk generated in step 5) 31 ~rk 24 And iterative work word (W) (23) 3 ,W (23) 2 ,W (23) 1 ,W (23) 0 ,W’ (23) 3 ,W’ (23) 2 ,W’ (23) 1 ,W’ (23) 0 ) Executing 4 th SM4 round function iteration instruction (VSM 4R) for input, completing SM4 encryption algorithm 25-32 round function iterations, and obtaining iteration work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 );
7) The iterative work word (W) in the step 6) is processed (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 ) Outputting in reverse order to obtain the execution result ciphertext (Y) of the encryption algorithm 3 ,Y 2 ,Y 1 ,Y 0 ,Y’ 3 ,Y’ 2 ,Y’ 1 ,Y’ 0 )。
The specific process of the decryption algorithm in the method for accelerating the SM4 block cipher algorithm is shown in fig. 3, and includes the following steps:
1) Generating round key iteration initial value (K) using general instructions in a general purpose processor 3 ,K 2 ,K 1 ,K 0 );
2) With iterative initial value (K) 3 ,K 2 ,K 1 ,K 0 ) And a system fixed parameter CK 31 ~CK 0 For the source operation data, sequentially executing an SM4 round KEY generation instruction (VSM 4 KEY) 4 times to generate 32 round KEYs rk of an SM4 block cipher algorithm 31 ~rk 0
3) The secret key rk generated in step 2) 31 ~rk 0 And ciphertext (Y) 3 ,Y 2 ,Y 1 ,Y 0 ,Y’ 3 ,Y’ 2 ,Y’ 1 ,Y’ 0 ) Sequentially executing an SM4 round function iteration instruction (VSM 4R) 4 times for inputting, completing 32 round function iterations of an SM4 decryption algorithm, and obtaining an iteration work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 );
4) The iterative work word (W) in the step 3) is processed (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 ) Outputting in reverse order to obtain the execution result plaintext (W) of the decryption algorithm 3 ,W 2 ,W 1 ,W 0 ,W’ 3 ,W’ 2 ,W’ 1 ,W’ 0 )。
An SM4 round KEY generation instruction (VSM 4 KEY) adopts a plurality of SM4 round KEY parallel generation algorithms, and the instruction can generate 8 round KEYs of an SM4 block cipher algorithm after being executed once; the multiple SM4 round key parallel generation algorithm is shown in FIG. 4, and functions as the previous 4 32-bit intermediate keys K 3 ~K 0 And 8 32-bit algorithm fixed parameters related to subsequent 8 round keysNumber CK 7 ~CK 0 As an input, 8 rounds of key generation can be done once in the SM4 round key expansion process.
The SM4 round function iteration instruction (VSM 4R) adopts a multi-round SM4 iteration parallel execution algorithm, and 8 iterations of the SM4 block cipher algorithm encryption and decryption round functions can be completed by executing the instruction once; the multiple rounds of SM4 iterative parallel execution algorithm is shown in FIG. 5, and functions with two sets of uncorrelated current 4 intermediate words W 3 ~W 0 、W’ 3 ~W’ 0 8 round keys rk used with subsequent 8 rounds of operations 7 ~rk 0 For input, during the encryption/decryption process of the SM4 block cipher algorithm, two sets of uncorrelated 4 intermediate words are generated after 8 rounds of performing the SM4 encryption/decryption round function.
The SM4 round KEY generation instruction (VSM 4 KEY) adopts a simple operation instruction format in an immediate format, the instruction format is VSM4KEY Va, # b, vc, and is used to instruct an operand in 1 source register Va with 256 bits and an immediate operand with 8 bits to perform an operation, and the result is saved in a target register Vc with 256 bits, as shown in fig. 6, where [ 31.
The function of the SM4 round KEY generation instruction (VSM 4 KEY) is based on the previous 4 intermediate KEYs K 3 ~K 0 Generating 8 subsequent round keys rk of SM4 block cipher algorithm 7 ~rk 0 . Wherein K 3 ~K 0 Stored in the high 128 bits of the source register Va, algorithm fixed parameter CK 7 ~CK 0 Determined by immediate # b, the resulting result rk 7 ~rk 0 Stored in the destination register Vc, the SM4 round KEY generation instruction (VSM 4 KEY) performs the following operations:
Figure BDA0003897676470000101
Figure BDA0003897676470000111
wherein, the SELCK (# b) function is used for determining an algorithm fixed parameter CK in the key expansion process of the SM4 block cipher algorithm i Determines 8 32-bit data CK according to # b 7 ~CK 0 . Specific values are shown in table 1, wherein the data are 16-system numbers.
TABLE 1 SELCK (# b) function value in SM4 round KEY generation instruction (VSM 4 KEY)
Figure BDA0003897676470000112
The SM4 round KEY generation instruction (VSM 4 KEY) is executed once to generate 8 round KEYs of the SM4 block cipher algorithm, the instruction is executed 4 times in sequence, va is updated by the high 128 bits of generated Vc each time, the immediate number # b is increased by 1, and 32 round KEYs rk of the SM4 block cipher algorithm can be generated 31 ~rk 0
The SM4 round Function iterative instruction (VSM 4R) is in a simple operation instruction format of a register format, which is VSM4R Va, vb, vc, and is used to instruct two operands in two 256-bit source registers Va and Vb to perform operation, and the result is stored in a 256-bit destination register Vc, as shown in fig. 7, where [ 31.
The function of the SM4 round function iteration instruction (VSM 4R) is based on the current 4 intermediate words W 3 ~W 0 And 8 round keys rk for subsequent 8 round operations 7 ~rk 0 Generating 4 intermediate words W after 8 rounds of SM4 block cipher algorithm 11W 8 2 operations of 128 bits can be done in parallel, where W 3 ~W 0 Stored in the high or low 128 bits of the source register Va, rk 7 ~rk 0 Stored in a source register Vb, and a generated result W 11 ~W 8 Stored in the corresponding high or low bits of the destination register Vc, the SM4 round function iterator instruction (VSM 4R) performs the following operations:
Figure BDA0003897676470000121
executing the SM4 round function iteration instruction (VSM 4R) once can complete 8 iterations of the encryption and decryption round functions of the two groups of SM4 block cipher algorithms, updating Vb with 8 new round keys each time, updating Va with the generated Vc, and executing the instruction 4 times in sequence to generate 4 final words of the two groups of SM4 block cipher algorithms.
In the operation defined by the SM4 extended instruction set, the function of the SBOX (X) function is to perform a table look-up operation in parallel according to 4 bytes (X [31 ], X [23 ], X [ 16], X [15 ], X [ 8], X [7 ]) of 32-bit data X, to obtain a new 32-bit data, that is, SBOX (X) = { SBOX (X [31 ]:24 ]), SBOX (X [23 ]), SBOX (X [15 ]), SBOX (X [ 8 ]), SBOX (X [7 ]):0 ]) }. Specific values are shown in table 2, where the data are 16-ary numbers. Example (c): if the value of a byte in the table lookup is 0xef, the value after the table lookup by SBOX is the value of the e-th row and f-th column in the table, and SBOX (0 xef) =0x84.
TABLE 2 SM4 extended instruction set SBOX (X) byte lookup
0 1 2 3 4 5 6 7 8 9 a b c d e f
0 d6 90 e9 fe cc e1 3d b7 16 b6 14 c2 28 fb 2c 05
1 2b 67 9a 76 2a be 04 c3 aa 44 13 26 49 86 06 99
2 9c 42 50 f4 91 ef 98 7a 33 54 0b 43 ed cf ac 62
3 e4 b3 1c a9 c9 08 e8 95 80 df 94 fa 75 8f 3f a6
4 47 07 a7 fc f3 73 17 ba 83 59 3c 19 e6 85 4f a8
5 68 6b 81 b2 71 64 da 8b f8 eb 0f 4b 70 56 9d 35
6 1e 24 0e 5e 63 58 d1 a2 25 22 7c 3b 01 21 78 87
7 d4 00 46 57 9f d3 27 52 4c 36 02 e7 a0 c4 c8 9e
8 ea bf 8a d2 40 c7 38 b5 a3 f7 f2 ce f9 61 15 a1
9 e0 ae 5d a4 9b 34 1a 55 ad 93 32 30 f5 8c b1 e3
a 1d f6 e2 2e 82 66 ca 60 c0 29 23 ab 0d 53 4e 6f
b d5 db 37 45 de fd 8e 2f 03 ff 6a 72 6d 6c 5b 51
c 8d 1b af 92 bb dd bc 7f 11 d9 5c 41 1f 10 5a d8
d 0a c1 31 88 a5 cd 7b bd 2d 74 d0 12 b8 e5 b4 b0
e 89 69 97 4a 0c 96 77 7e 65 b9 f1 09 c5 6e c6 84
f 18 f0 7d ec 3a dc 4d 20 79 ee 5f 3e d7 cb 39 48
The embodiment of the present invention further relates to an instruction set processor, as shown in fig. 10, which includes a register file, an SM4 round KEY generation instruction execution unit, and an SM4 round function iteration instruction execution unit, where the SM4 round KEY generation instruction execution unit and the SM4 round function iteration instruction execution unit are disposed on different execution pipelines, and respectively occupy different read and write ports of the register file (or may be disposed on the same execution pipeline to share the read and write ports of the register file, but may cause that the VSM4KEY instruction and the VSM4R instruction cannot be started or completed at the same time); the execution delay of the VSM4KEY instruction and the VSM4R instruction is 9 beats, and the parallel pipeline execution of the VSM4KEY instruction and the VSM4R instruction is supported, so that higher operation speed is realized.
The VSM4KEY instruction execution unit is configured to receive and execute a VSM4KEY instruction, as shown in fig. 8, and the inputs thereof include: a 128-bit operand A (4 intermediate keys from the register file onward) and an 8-bit immediate operand B (for hardware table lookup to derive the system parameter CK i+7 ~CK i+0 ) The output is a 256-bit execution result (write-back register file) which is 8 round keys of the SM4 block cipher algorithm; the VSM4KEY instruction execution unit adopts a running water execution technology and executes onceCan complete a VSM4KEY instruction (generate 8 round KEYs of SM4 block cipher algorithm), and can generate 32 round KEYs rk of SM4 block cipher algorithm by executing for 4 times in sequence 31 ~rk 0 (ii) a The VSM4KEY instruction execution unit is provided with an 8-stage iteration execution platform and a 1-stage output platform, the total execution delay is 9 beats, the pipeline execution is supported, and the hardware execution speed of the instruction is improved by adopting methods of directly realizing shift operation by hardware logic, processing system parameters by special hardware logic, realizing SBOX operation by hardware table lookup, realizing 8 round KEY pipeline parallel processing and the like, so that the acceleration effect is improved.
The VSM4R instruction execution unit is configured to receive and execute a VSM4R instruction, as shown in fig. 9, and its input signals include: a 258-bit operand a (two sets of data-independent intermediate iterative working words, each set comprising 4 32-bit intermediate iterative working words from the register file) and a 256-bit operand B (8 32-bit round keys from the register file), the output signal being a 256-bit execution result (write-back register file), two sets of data-independent iterative working words (each set comprising 4 32-bit SM4 iterative working words) that have undergone 4 rounds of iterative updating; the VSM4R instruction execution unit adopts a pipeline execution technology, can complete one VSM4R instruction (complete 8 iterations of an SM4 block cipher algorithm encryption and decryption round function) after executing once, and can generate final iteration results of two groups of SM4 block cipher algorithms with irrelevant data (each group comprises 4 SM4 iteration working words with 32 bits); the VSM4R instruction execution unit is provided with 8-stage iteration execution stations and 1-stage output stations, the total execution delay is 9 beats, instruction pipeline execution is supported, and the hardware execution speed of the instruction is increased by adopting methods of directly realizing shift operation by hardware logic, processing system parameters by special hardware logic, realizing SBOX operation by hardware table lookup, realizing 8 round-robin function pipeline parallel processing and the like, so that the acceleration effect is improved.
By way of specific examples, the following: the present invention is further described in a method for accelerating an SM3 cryptographic hash algorithm in a general purpose processor.
Before SM4 round KEY expansion is performed by executing SM4 round KEY generation instruction (VSM 4 KEY) for the first time, SM4 round KEY expansion is performed by adopting the general-purpose processorGeneral instruction generation round key iteration initial value (K) 3 ,K 2 ,K 1 ,K 0 ) And iterates the round key by the initial value (K) 3 ,K 2 ,K 1 ,K 0 ) And loading plaintext or ciphertext into a register, and specifically describing the process of accelerating the encryption algorithm of the SM4 block cipher algorithm by using the method and the processor of the invention:
(1) In 1 st to 9 th clock cycles (1 st to 9 th beats), the processor for accelerating the SM4 block cipher algorithm starts to execute a 1 st SM4 round KEY generation instruction (VSM 4 KEY), wherein the initial input of the VSM4KEY instruction component is a round KEY iteration initial value { K } 3 ,K 2 ,K 1 ,K 0 } (as the upper 128 bits of source operand A for the VSM4KEY instruction) and an 8-bit immediate operand of 8; the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. The VSM4KEY instruction unit finishes executing the 1 st VSM4KEY instruction at the end of the 9 th clock cycle (9 th beat) and generates 8 round KEYs { rk ] of the SM4 block cipher algorithm 7 ~rk 0 }。
(2) 10 th to 18 th clock cycles (beats 10 to 18), the processor accelerating the SM4 block cipher algorithm starts to execute the 2 nd SM4 round KEY generation instruction (VSM 4 KEY) and the 1 st SM4 round function iteration instruction (VSM 4R) in parallel, wherein the input of the VSM4KEY instruction component is the round KEY { rk } of the round KEY 7 ~rk 4 } (as the upper 128 bits of source operand A for the VSM4KEY instruction) and an 8-bit immediate operand of 8; inputs to the VSM4R command component include: 128-bit plaintext W with two unrelated sets of data (0) 3 ~W (0) 0 、W’ (0) 3 ~W’ (0) 0 As the source operand A of the VSM4R instruction, and a set of 256-bit round keys rk 7 ~rk 0 } (as source operand B of the VSM4R instruction); idle execution stations of the VSM4KEY command unit and the VSM3R command unit during execution may execute commands in other data-independent SM4 packet cipher algorithm programs. The VSM4KEY instruction unit finishes executing the 2 nd VSM4KEY instruction at the end of the 18 th clock cycle (18 th beat) and generates 8 new SM4 block cipher algorithmsRound key rk 15 ~rk 8 }; the VSM4R instruction component executes the 1 st VSM4R instruction, completes the 1 st to 8 th round function iteration of the SM4 block cipher algorithm encryption process and generates an intermediate iteration work word { W (7) 3 ~W (7) 0 、W’ (7) 3 ~W’ (7) 0 }。
(3) In clock cycles 19-27 (beats 19-27), the processor accelerating the SM4 block cipher algorithm starts to execute the 3 rd SM4 round KEY generation instruction (VSM 4 KEY) and the 2 nd SM4 round function iteration instruction (VSM 4R) in parallel, wherein the input of the VSM4KEY instruction component is the round KEY { rk } KEY 15 ~rk 12 } (as the upper 128 bits of source operand A for the VSM4KEY instruction) and an 8-bit immediate operand of 8; inputs to the VSM4R command component include: two sets of data-independent 128-bit intermediate iterative working words W (7) 3 ~W (7) 0 、W’ (7) 3 ~W’ (7) 0 As the source operand A of the VSM4R instruction, and a set of 256-bit round keys rk 15 ~rk 8 } (as source operand B of the VSM4R instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. The VSM4KEY instruction unit finishes executing the 3 rd VSM4KEY instruction at the end of the 27 th clock cycle (27 th beat) and generates a new 8 round KEYs { rk ] of the SM4 block cipher algorithm 23 ~rk 16 }; the VSM4R instruction part executes the 2 nd VSM4R instruction, completes 9 th-16 th round function iteration of the SM4 block cipher algorithm encryption process and generates an intermediate iteration working word { W (15) 3 ~W (15) 0 、W’ (15) 3 ~W’ (15) 0 }。
(4) 28 th to 36 th clock cycles (beats 28 to 36), the processor accelerating the SM4 block cipher algorithm starts to execute the 4 th SM4 round KEY generation instruction (VSM 4 KEY) and the 3 rd SM4 round function iteration instruction (VSM 4R) in parallel, wherein the input of the VSM4KEY instruction component is the round KEY { rk } KEY 23 ~rk 20 The immediate (128 upper bits as source operand A for the VSM4KEY instruction) and 8 bitsOperand 8' b00000011; inputs to the VSM4R command component include: two sets of data-independent 128-bit intermediate iterative work words W (15) 3 ~W (15) 0 、W’ (15) 3 ~W’ (15) 0 As the source operand A of the VSM4R instruction, and a set of 256-bit round keys rk 23 ~rk 16 } (as source operand B of the VSM4R instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. The VSM4KEY instruction unit finishes executing the 4 th VSM4KEY instruction at the end of the 36 th clock cycle (36 th beat) and generates a new 8 round KEYs { rk ] of the SM4 block cipher algorithm 31 ~rk 24 }; the VSM4R instruction part executes the 3 rd VSM4R instruction, completes 17 th-24 th round function iteration of the SM4 block cipher algorithm encryption process and generates an intermediate iteration working word { W (23) 3 ~W (23) 0 、W’ (23) 3 ~W’ (23) 0 }。
(4) In the 37 th to 45 th clock cycles (37 to 45 beats), the processor accelerating the SM4 block cipher algorithm starts to execute the 4 th SM4 round function iteration instruction (VSM 4R), wherein the input of the VSM4R instruction component comprises: two sets of data-independent 128-bit intermediate iterative working words W (23) 3 ~W (23) 0 、W’ (23) 3 ~W’ (23) 0 As the source operand A of the VSM4R instruction, and a set of 256-bit round keys rk 31 ~rk 24 } (as source operand B of the VSM4R instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. When the 45 th clock cycle (45 th beat) is finished, the VSM4R instruction component executes the 4 th VSM4R instruction, finishes 25 th to 32 th round function iterations of the SM4 block cipher algorithm encryption process, and generates an intermediate iteration work word { W } (31) 3 ~W (31) 0 、W’ (31) 3 ~W’ (31) 0 }. To this end, the processor of the SM4 block cipher algorithm is acceleratedThe round key expansion and the execution of the encryption round function iteration of 1 SM4 block cipher algorithm are realized by only using the obtained intermediate iteration working word { W (31) 3 ~W (31) 0 、W’ (31) 3 ~W’ (31) 0 And (5) outputting in a reverse order to obtain a ciphertext.
Considering that the processor for accelerating the SM4 block cipher algorithm supports full-pipeline execution of parallel instructions, under the condition of continuous pipeline execution, only 54 beats can be used for finishing round key generation and encryption iteration of 9 groups of unrelated SM4 block cipher algorithms at the shortest time, and the execution speed of the encryption algorithm of the SM4 block cipher algorithm is greatly accelerated. If a plurality of instruction execution units for accelerating the SM4 block cipher algorithm can be provided in the processor, the effect of accelerating the execution of the encryption algorithm of the SM4 block cipher algorithm can be further enhanced.
The following describes in detail the process of accelerating the decryption algorithm of the SM4 block cipher algorithm using the method of the present invention:
(1) 1 st to 9 th clock cycles (beats 1 to 9), the processor accelerating the SM4 block cipher algorithm starts to execute a 1 st SM4 round KEY generation instruction (VSM 4 KEY), wherein the initial input of the VSM4KEY instruction component is a round KEY iteration initial value { K } 3 ,K 2 ,K 1 ,K 0 } (as the upper 128 bits of source operand A for the VSM4KEY instruction) and an 8-bit immediate operand of 8; the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. The VSM4KEY instruction unit finishes executing the 1 st VSM4KEY instruction at the end of the 9 th clock cycle (beat 9), and generates 8 round KEYs { rk ] of the SM4 block cipher algorithm 7 ~rk 0 }。
(2) 10 th to 18 th clock cycles (beats 10 to 18), the processor accelerating the SM4 block cipher algorithm starts executing the 2 nd SM4 round KEY generation instruction (VSM 4 KEY), where the input to the VSM4KEY instruction component is the round KEY { rk } 7 ~rk 4 8' b00000001 (as the upper 128 bits of source operand A for the VSM4KEY instruction); idleness of VSM4KEY instruction unit and VSM3R instruction unit during executionThe execution station may execute instructions in other data-independent SM4 block cipher algorithm programs. The VSM4KEY instruction unit finishes executing the 2 nd VSM4KEY instruction at the end of the 18 th clock cycle (18 th beat) to generate a new 8 round KEY (rk) of the SM4 block cipher algorithm 15 ~rk 8 }。
(3) In clock cycles 19-27 (beats 19-27), the processor accelerating the SM4 block cipher algorithm begins executing the 3 rd SM4 round KEY generation instruction (VSM 4 KEY), where the input to the VSM4KEY instruction component is the round KEY { rk } 15 ~rk 12 } (as the upper 128 bits of source operand A for the VSM4KEY instruction) and an 8-bit immediate operand of 8; the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. The VSM4KEY instruction unit finishes executing the 3 rd VSM4KEY instruction at the end of the 27 th clock cycle (27 th beat) and generates a new 8 round KEYs { rk ] of the SM4 block cipher algorithm 23 ~rk 16 }。
(4) 28 th to 36 th clock cycles (beats 28 to 36), the processor accelerating the SM4 block cipher algorithm starts executing the 4 th SM4 round KEY generation instruction (VSM 4 KEY), where the input to the VSM4KEY instruction component is the round KEY { rk } 23 ~rk 20 8' b00000011 (as the upper 128 bits of source operand A for the VSM4KEY instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. The VSM4KEY instruction unit finishes executing the 4 th VSM4KEY instruction at the end of the 36 th clock cycle (36 th beat) and generates a new 8 round KEYs { rk ] of the SM4 block cipher algorithm 31 ~rk 24 }。
(5) In the 37 th to 45 th clock cycles (37 to 45 beats), the processor accelerating the SM4 block cipher algorithm starts to execute the 1 st SM4 round function iteration instruction (VSM 4R), wherein the input of the VSM4R instruction component comprises: two groups of data are unrelated, namely 128-bit ciphertext { W (0) 3 ~W (0) 0 、W’ (0) 3 ~W’ (0) 0 As source operand A of the VSM4R instruction, and oneSet of 256-bit round keys rk 31 ~rk 24 } (as source operand B of the VSM4R instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. When the 45 th clock cycle (45 th beat) is finished, the VSM4R instruction component executes the 1 st VSM4R instruction, completes the 1 st to 8 th round function iterations of the SM4 block cipher algorithm decryption process, and generates an intermediate iteration work word { W } (7) 3 ~W (7) 0 、W’ (7) 3 ~W’ (7) 0 }。
(6) 46-54 clock cycles (46-54 beats), the processor accelerating the SM4 block cipher algorithm starts executing the 2 nd SM4 round function iteration instruction (VSM 4R), wherein the input of the VSM4R instruction component comprises: two groups of unrelated data are respectively 128-bit ciphertexts { W (7) 3 ~W (7) 0 、W’ (7) 3 ~W’ (7) 0 A source operand A as a VSM4R instruction, and a set of 256-bit round keys rk 23 ~rk 16 } (as source operand B of the VSM4R instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. When the 54 th clock cycle (54 th beat) is finished, the VSM4R instruction component executes the 2 nd VSM4R instruction, finishes 9 th to 16 th round function iterations of the SM4 block cipher algorithm decryption process, and generates an intermediate iteration work word { W } (15) 3 ~W (15) 0 、W’ (15) 3 ~W’ (15) 0 }。
(6) In the 55 th to 63 th clock cycles (55-63 beats), the processor accelerating the SM4 block cipher algorithm starts to execute the 3 rd SM4 round function iteration instruction (VSM 4R), wherein the input of the VSM4R instruction component comprises: two groups of data are unrelated, namely 128-bit ciphertext { W (15) 3 ~W (15) 0 、W’ (15) 3 ~W’ (15) 0 As the source operand A of the VSM4R instruction, and a set of 256-bit round keys rk 15 ~rk 8 } (as source operand B of the VSM4R instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. When the 63 rd clock cycle (63 rd beat) is finished, the VSM4R instruction component executes the 3 rd VSM4R instruction, finishes 17 th-24 th round function iteration of the SM4 block cipher algorithm decryption process, and generates an intermediate iteration work word { W } (23) 3 ~W (23) 0 、W’ (23) 3 ~W’ (23) 0 }。
(6) 64 th to 72 th clock cycles (64 to 72 beats), the processor accelerating the SM4 block cipher algorithm starts executing the 4 th SM4 round function iteration instruction (VSM 4R), wherein the inputs of the VSM4R instruction component include: two groups of unrelated data are respectively 128-bit ciphertexts { W (23) 3 ~W (23) 0 、W’ (23) 3 ~W’ (23) 0 A source operand A as a VSM4R instruction, and a set of 256-bit round keys rk 7 ~rk 0 } (as source operand B of the VSM4R instruction); the idle execution stations of the VSM4KEY command unit and the VSM3R command unit may execute instructions of other data-independent SM4 packet cipher algorithm programs during execution. When the 72 th clock cycle (72 th beat) is finished, the VSM4R instruction component executes the 4 th VSM4R instruction, finishes the 25 th to 32 th round function iteration of the SM4 block cipher algorithm decryption process, and generates an intermediate iteration work word { W } (31) 3 ~W (31) 0 、W’ (31) 3 ~W’ (31) 0 }. So far, the processor for accelerating the SM4 block cipher algorithm completes the round key expansion and decryption round function iteration of 1 SM4 block cipher algorithm, and only needs to obtain the intermediate iteration work word { W } (31) 3 ~W (31) 0 、W’ (31) 3 ~W’ (31) 0 And outputting in a reverse order to obtain the decrypted plaintext.
Considering that the processor for accelerating the SM4 block cipher algorithm supports parallel instruction stream execution, under the condition of continuous stream execution, the round key generation and decryption iteration of 9 groups of irrelevant SM4 block cipher algorithms can be completed only by 81 beats at the shortest time, and the execution speed of the decryption algorithm of the SM4 block cipher algorithm is greatly accelerated. If a plurality of instruction execution units for accelerating the SM4 block cipher algorithm can be provided in the processor, the effect of accelerating the decryption algorithm for executing the SM4 block cipher algorithm can be further enhanced.

Claims (8)

1. An SM4 block cipher algorithm acceleration method is characterized in that based on an SM4 expansion instruction set, a parallel pipeline and an instruction level parallel technology are adopted to realize the SM4 block cipher algorithm in an acceleration mode, and the SM4 block cipher algorithm comprises an SM4key expansion algorithm and an SM4 encryption and decryption algorithm; the SM4 extended instruction set adopts a RISC architecture, the instruction adopts a fixed-length 32-bit format, and a source operand and a target operand are 256 bits; the SM4 expansion instruction set comprises an SM4 round key generation instruction and an SM4 round function iteration instruction; the SM4 round key generation instruction adopts a plurality of SM4 round key parallel generation algorithms to accelerate the SM4key expansion algorithm, and the SM4 round key parallel generation algorithm uses the previous 4 32-bit intermediate keys K 3 ~K 0 And 8 32-bit algorithm fixed parameters CK associated with the subsequent 8 round keys 7 ~CK 0 As input, in the SM4 round key expansion process, 8 round key generation can be completed by executing one time; the SM4 round function iteration instruction adopts a multi-round SM4 iteration parallel execution algorithm to accelerate the SM4 encryption and decryption algorithm, and the SM4 iteration parallel execution algorithm adopts two groups of irrelevant current 4 intermediate words W 3 ~W 0 、W’ 3 ~W’ 0 8 round keys rk used with subsequent 8 rounds of operations 7 ~rk 0 For input, during the encryption/decryption process of the SM4 block cipher algorithm, two sets of uncorrelated 4 intermediate words are generated after 8 rounds of performing the SM4 encryption/decryption round function.
2. The method for accelerating the SM4 block cipher algorithm of claim 1, wherein when the parallel pipeline and instruction level parallelism is adopted to accelerate the realization of the SM4 block cipher algorithm:
the encryption method comprises the following steps:
(A) Generating round key iteration initial value (K) using general instructions in a general purpose processor 3 ,K 2 ,K 1 ,K 0 );
(B) Iterating the initial value (K) with the round key 3 ,K 2 ,K 1 ,K 0 ) And a system fixed parameter CK 7 ~CK 0 Executing 1 st SM4 round key generation instruction for input to generate 8 round keys rk of SM4 block cipher algorithm 7 ~rk 0
(C) With the round key rk 7 ~rk 4 And a system fixed parameter CK 15 ~CK 8 Executing 2 nd SM4 round key generation instruction for input to generate 8 round keys rk of SM4 block cipher algorithm 15 ~rk 8 (ii) a Simultaneously with the round key rk 7 ~rk 0 Plaintext (W) unrelated to two sets of data 3 ,W 2 ,W 1 ,W 0 ,W’ 3 ,W’ 2 ,W’ 1 ,W’ 0 ) Executing a 1 st SM4 round function iteration instruction for input, finishing 1-8 round function iterations of an SM4 encryption algorithm, and obtaining an iterative work word (W) (7) 3 ,W (7) 2 ,W (7) 1 ,W (7) 0 ,W’ (7) 3 ,W’ (7) 2 ,W’ (7) 1 ,W’ (7) 0 );
(D) With the round key rk 15 ~rk 12 And a system fixed parameter CK 23 ~CK 16 Executing 3 rd SM4 round key generation instruction for input to generate 8 round keys rk of SM4 block cipher algorithm 23 ~rk 16 (ii) a Simultaneously with the round key rk 15 ~rk 8 And an iterative working word (W) (7) 3 ,W (7) 2 ,W (7) 1 ,W (7) 0 ,W’ (7) 3 ,W’ (7) 2 ,W’ (7) 1 ,W’ (7) 0 ) For inputting, executing 2 nd SM4 round function iteration instruction, completing SM4 encryption algorithm 9-16 round function iterations, and obtaining iteration work word (W) (15) 3 ,W (15) 2 ,W (15) 1 ,W (15) 0 ,W’ (15) 3 ,W’ (15) 2 ,W’ (15) 1 ,W’ (15) 0 );
(E) With the round key rk 23 ~rk 20 And a system fixed parameter CK 31 ~CK 24 Executing 4 th SM4 round key generation instruction for input to generate 8 round keys rk of SM4 block cipher algorithm 31 ~rk 24 (ii) a Simultaneously with the round key rk 23 ~rk 16 And an iterative work word (W) (15) 3 ,W (15) 2 ,W (15) 1 ,W (15) 0 ,W’ (15) 3 ,W’ (15) 2 ,W’ (15) 1 ,W’ (15) 0 ) Executing a 3 rd SM4 round function iteration instruction (VSM 4R) for input, completing 17-24 round function iterations of an SM4 encryption algorithm, and obtaining an iteration work word (W) (23) 3 ,W (23) 2 ,W (23) 1 ,W (23) 0 ,W’ (23) 3 ,W’ (23) 2 ,W’ (23) 1 ,W’ (23) 0 );
(F) With the round key rk 31 ~rk 24 And an iterative work word (W) (23) 3 ,W (23) 2 ,W (23) 1 ,W (23) 0 ,W’ (23) 3 ,W’ (23) 2 ,W’ (23) 1 ,W’ (23) 0 ) Executing 4 th SM4 round function iteration instruction for input, completing SM4 encryption algorithm 25-32 round function iterations, and obtaining an iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 );
(G) The iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 ) Outputting in reverse order to obtain the execution result ciphertext (Y) of the encryption algorithm 3 ,Y 2 ,Y 1 ,Y 0 ,Y’ 3 ,Y’ 2 ,Y’ 1 ,Y’ 0 ) (ii) a The decryption method comprises the following steps:
(a) Generating round key iteration initial value (K) using general instructions in a general purpose processor 3 ,K 2 ,K 1 ,K 0 );
(b) Iterating the initial value (K) with the round key 3 ,K 2 ,K 1 ,K 0 ) And a system fixed parameter CK 31 ~CK 0 Sequentially executing the SM4 round key generation instruction for 4 times to generate 32 round keys rk of the SM4 block cipher algorithm for source operation data 31 ~rk 0
(c) With the round key rk 31 ~rk 0 And ciphertext (Y) 3 ,Y 2 ,Y 1 ,Y 0 ,Y’ 3 ,Y’ 2 ,Y’ 1 ,Y’ 0 ) Sequentially executing an SM4 round function iteration instruction for 4 times to complete 32 round function iterations of an SM4 decryption algorithm to obtain an iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 );
(d) The iterative work word (W) (31) 3 ,W (31) 2 ,W (31) 1 ,W (31) 0 ,W’ (31) 3 ,W’ (31) 2 ,W’ (31) 1 ,W’ (31) 0 ) Outputting in reverse order to obtain the execution result plaintext (W) of the decryption algorithm 3 ,W 2 ,W 1 ,W 0 ,W’ 3 ,W’ 2 ,W’ 1 ,W’ 0 )。
3. The method for accelerating the SM4 block cipher algorithm according to claim 1, wherein the SM4 round KEY generation instruction uses a simple operation instruction format in an immediate format, specifically VSM4KEY Va, # b, vc, for instructing 1 operand in a 256-bit source register Va and an 8-bit immediate operand to perform an operation, and the result is stored in a 256-bit destination register Vc, the [ 31.
4. The method for accelerating an SM4 block cipher algorithm according to claim 1, wherein the SM4 round key generation instruction is specifically: according to the previous 4 intermediate keys K 3 ~K 0 Generating 8 subsequent round keys rk of SM4 block cipher algorithm 7 ~rk 0 (ii) a Wherein the intermediate key K 3 ~K 0 Stored in the high 128 bits of the source register Va, algorithm fixed parameter CK 7 ~CK 0 Determined by immediate # b, the resulting result rk 7 ~rk 0 Stored in the target register Vc; results rk 7 ~rk 0 Are respectively equal to the intermediate key K 11 ~K 4 For i equal to 0,1,2,3,4,5,6,7, K i+4 The generation logic of (a) is: k i+4 =K i XOR Temp2 XOR(Temp2<<<13)XOR(Temp2<<<23 Where XOR represents bitwise XOR,<<<Representing the loop left shift, temp2 is a 32-bit intermediate variable word, and the generation logic of Temp2 is: temp2[31]= SBOX (Temp 1), SBOX (X) function looks up table according to 4 bytes of X in parallel to get a new 32-bit data, temp1 is a 32-bit intermediate variable word, and the generation logic of Temp1 is: temp1[31]=K i+1 XORK i+2 XORK i+3 XOR CK i ,CK i For calculatingOne 32-bit fixed parameter of the method, { CK 7 ~CK 0 } = SELCK (# b), where the SELCK (# b) function is used to determine the algorithm fixed parameter CK in the key expansion process of the SM4 block cipher algorithm i Can determine 8 32-bit data CK from # b according to SM4 block cipher algorithm 7 ~CK 0 The specific value of (a); executing the SM4 round key generation instruction once can generate 8 round keys of an SM4 block cipher algorithm, sequentially executing the SM4 round key generation instruction 4 times, updating the high 128 bits of the source register Va with the high 128 bits of the generated target register Vc each time, increasing the immediate number # b by 1, and generating 32 round keys rk of the SM4 block cipher algorithm 31 ~rk 0
5. A method for accelerating the SM4 block cipher algorithm according to claim 1, wherein the SM4 round function iterating instruction uses a simple operation instruction format in register format, specifically VSM4R Va, vb, vc, for instructing two operands in two 256-bit source registers Va and Vb to operate, and the result is stored in a 256-bit destination register Vc, the [ 31.
6. The method for accelerating an SM4 block cipher algorithm according to claim 1, wherein the SM4 round function iteration instruction is specifically: according to the current 4 intermediate words W 3 ~W 0 And 8 round keys rk for subsequent 8 round operations 7 ~rk 0 Generating 4 intermediate words W after 8 iterations of SM4 block cipher algorithm 11 ~W 8 In parallel to accomplish2 operations of 128 bits, where W 3 ~W 0 Stored in the high or low 128 bits of the source register Va, rk 7 ~rk 0 Stored in a source register Vb, and generates a result W 11 ~W 8 Stored in the corresponding high order or low order of the target register Vc; in 8 iterations, for j equal to 0,1,2,3,4,5,6,7, result W j+4 The generation logic of (1) is: w j+4 =W j XOR Temp2 XOR(Temp2<<<2)XOR(Temp2<<<10)XOR(Temp2<<<18)XOR(Temp2<<<24 Where XOR represents bitwise XOR,<<<Representing the loop left shift, temp2 is a 32-bit intermediate variable word, and the generation logic of Temp2 is: temp2[31]The function SBOX (Temp 1) is used to perform table lookup operation in parallel according to 4 bytes of 32-bit data X to obtain a new 32-bit data, temp1 is a 32-bit intermediate variable word, and the generation logic of Temp1 is: temp1[31]=W j+1 XOR W j+2 XOR W j+3 XOR rk j (ii) a And executing the SM4 round function iteration instruction to finish 8 iterations of the encryption and decryption round functions of the two groups of SM4 block cipher algorithms once, sequentially executing the SM4 round function iteration instruction for 4 times, updating the source register Vb by 8 new round keys each time, updating the data in the source register Va by the generated data in the target register Vc, and generating the final 4 words of the two groups of SM4 block cipher algorithms.
7. An instruction set processor is characterized by comprising a register file, an SM4 round key generation instruction execution unit and an SM4 round function iteration instruction execution unit, wherein the SM4 round key generation instruction execution unit and the SM4 round function iteration instruction execution unit are placed on different execution pipelines and respectively occupy different read and write ports of the register file;
the SM4 round key generation instruction execution unit has:
two input terminals, for inputting a 128-bit operand A and an 8-bit immediate operand B, respectively;
an output for outputting a 256-bit execution result;
the SM4 round key generation instruction execution unit directly realizes shift operation and system parameter processing by adopting hardware logic, and realizes SBOX operation and 8 round key pipeline parallel processing by adopting hardware table lookup; the SM4 round key generation instruction execution unit can execute the SM4 round key generation instruction in a pipeline manner;
the SM4 round function iteration instruction execution unit is provided with:
two inputs for inputting a 258-bit operand A and a 256-bit operand B, respectively;
an output terminal for outputting a 256-bit execution result
The SM4 round function iteration instruction execution unit directly realizes shift operation and system parameter processing by adopting hardware logic, realizes SBOX operation and 8 round function pipeline parallel processing by adopting hardware table lookup, and can execute the SM4 round function iteration instruction in a pipeline mode.
8. The instruction set processor of claim 7 wherein the SM4 round key generation instruction execution unit sets 8 stages of iterative execution stages and 1 stage output stages with a total execution delay of 9 beats; the SM4 round function iteration instruction execution unit is provided with an 8-stage iteration execution platform and a 1-stage output platform, and the total execution delay is 9 beats; the instruction set processor supports parallel pipelined execution of the SM4 round key generation instruction and the SM4 round function iteration instruction.
CN202211280193.1A 2022-10-19 2022-10-19 Acceleration method of SM4 block cipher algorithm and instruction set processor Pending CN115658148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211280193.1A CN115658148A (en) 2022-10-19 2022-10-19 Acceleration method of SM4 block cipher algorithm and instruction set processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211280193.1A CN115658148A (en) 2022-10-19 2022-10-19 Acceleration method of SM4 block cipher algorithm and instruction set processor

Publications (1)

Publication Number Publication Date
CN115658148A true CN115658148A (en) 2023-01-31

Family

ID=84989773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211280193.1A Pending CN115658148A (en) 2022-10-19 2022-10-19 Acceleration method of SM4 block cipher algorithm and instruction set processor

Country Status (1)

Country Link
CN (1) CN115658148A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873431A (en) * 2024-03-13 2024-04-12 杭州金智塔科技有限公司 Random number generation method and device based on SM4 cryptographic algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873431A (en) * 2024-03-13 2024-04-12 杭州金智塔科技有限公司 Random number generation method and device based on SM4 cryptographic algorithm

Similar Documents

Publication Publication Date Title
CN106788974B (en) Mask S box, grouping key calculation unit, device and corresponding construction method
Lim CRYPTON: A new 128-bit block cipher
US5949884A (en) Design principles of the shade cipher
Benadjila et al. Sha-3 proposal: ECHO
US8635452B2 (en) Method for generating a cipher-based message authentication code
CN111464308A (en) Method and system for realizing reconstruction of multiple Hash algorithms
CN110311771B (en) SM4 encryption and decryption method and circuit
CN110011798A (en) The initial method and device and communication means of a kind of ZUC-256 stream cipher arithmetic
CN104184579A (en) Lightweight block cipher VH algorithm based on dual pseudo-random transformation
CN112367158B (en) Method for accelerating SM3 algorithm, processor, chip and electronic equipment
Zhang et al. LAC: A lightweight authenticated encryption cipher
Câmara et al. Fast software polynomial multiplication on ARM processors using the NEON engine
CN115658148A (en) Acceleration method of SM4 block cipher algorithm and instruction set processor
CN110336661B (en) AES-GCM data processing method, device, electronic equipment and storage medium
CN112564890B (en) Method, device, processor and electronic equipment for accelerating SM4 algorithm
CN101848078A (en) Perturbation method and encryption method for key stream sequence
CN113691364B (en) Encryption and decryption method of dynamic S-box block cipher based on bit slice technology
CN109033847B (en) AES encryption operation unit, AES encryption circuit and encryption method thereof
CN116318669A (en) Lightweight encryption method based on narrowband Internet of things
Bao et al. Quantum multi-collision distinguishers
RU2738321C1 (en) Cryptographic transformation method and device for its implementation
CN108989018B (en) AES encryption unit, AES encryption circuit and encryption method
Tiwari et al. Towards Finding Active Number of S-Boxes in Block Ciphers using Mixed Integer Linear Programming
Junod et al. Revisiting the IDEA philosophy
CN101848079A (en) Perturbation method and encryption method for character-oriented sequence with memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination