CN115134070A

CN115134070A - Method, device and equipment for realizing block cipher algorithm

Info

Publication number: CN115134070A
Application number: CN202210607651.1A
Authority: CN
Inventors: 王宇辰; 洪澄
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-09-30

Abstract

The application discloses a method, a device and equipment for realizing a block cipher algorithm. The method comprises the following steps: acquiring plaintext data to be encrypted; encrypting plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, wherein calculation of a round function in the first block cipher algorithm is realized by using an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation; and outputting the ciphertext data. The method can improve the operation efficiency of the block cipher algorithm.

Description

Method, device and equipment for realizing block cipher algorithm

Technical Field

The present application relates to the field of computer security technologies, and in particular, to a method, an apparatus, and a program for implementing a block cipher algorithm.

Background

A block cipher algorithm is a cipher algorithm that processes a block of data of a particular length at a time. For example, the SM4 algorithm is a commonly used block cipher algorithm, which is mainly used for data encryption. The SM4 algorithm comprises an encryption and decryption algorithm and a key expansion algorithm, and both the encryption and decryption algorithm and the key expansion algorithm adopt 32-round nonlinear iteration structures.

Conventionally, a block cipher algorithm (for example, SM4 algorithm) can be implemented based on software, but in such an implementation, a large number of operations and a corresponding large number of instructions need to be executed, which results in that more system overhead and lower operation efficiency are required to implement the block cipher algorithm based on the method.

Disclosure of Invention

The application provides a method, a device and equipment for realizing a block cipher algorithm, which can improve the operation efficiency of realizing the block cipher algorithm.

The embodiment of the application provides a method for realizing a block cipher algorithm, which comprises the following steps: acquiring plaintext data to be encrypted; encrypting the plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, wherein a round function in the first block cipher algorithm is calculated by using an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation; and outputting the ciphertext data.

Optionally, the encrypting the plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, where calculating a round function in the first block cipher algorithm by using an instruction set includes: obtaining input data of the nonlinear transformation, wherein the input data of the nonlinear transformation is determined according to a round key and the plaintext data, the input data of the nonlinear transformation is data in a first finite field, and the first finite field is a finite field of the first cipher block algorithm; in a second finite field, implementing the nonlinear transformation on the input data of the nonlinear transformation by using the instruction set, and obtaining a result of the nonlinear transformation, wherein the result of the nonlinear transformation is data in the second finite field, the second finite field is a finite field of a second grouping cryptographic algorithm, the second finite field has an isomorphic relationship with the first finite field, and the first grouping cryptographic algorithm is different from the second grouping cryptographic algorithm; obtaining the ciphertext data according to a result of the nonlinear transformation, the ciphertext data being data in the first finite field.

Optionally, the non-linear transformation includes a first affine transformation and a first inverse affine transformation, and the implementing the non-linear transformation with the instruction set in the second finite domain obtains a result of the non-linear transformation, including: performing the first affine transformation on first data, second data and the input data of the non-linear transformation by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the first data is data for mapping a preset matrix into the second finite field according to an isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; and executing the first inverse affine transformation on the preset matrix, the preset vector and the first affine transformation result by using a second instruction in the instruction set to obtain a result of the nonlinear transformation.

Optionally, the non-linear transformation includes a first affine transformation, a second affine transformation, and an inverse transformation, and the implementing the non-linear transformation with the instruction set in the second finite domain to obtain the result of the non-linear transformation includes: performing a first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the mapping result of the input data is data that maps the input data into the second finite field according to an isomorphic matrix, the first data is data that maps a preset matrix into the second finite field according to an isomorphic matrix, the second data is data that maps a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; performing the inverse transformation on the first affine transformation result by using a second instruction in the instruction set to obtain a result of the inverse transformation; performing a second affine transformation on the first data, the second data, and the result of the inverse transformation using the first instruction, obtaining a result of the non-linear transformation.

Optionally, the round function further includes a linear transformation, and obtaining the ciphertext data according to a result of the nonlinear transformation includes: performing the linear transformation on the result of the nonlinear transformation to obtain a result of the linear transformation; performing an exclusive-or operation on the result of the linear transformation and an internal state of the first block cipher algorithm, the internal state of the first block cipher algorithm being associated with the input data of the non-linear transformation, to obtain an output result of the round function; and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of output results of round functions obtained by the last round of iterative computation in the first block cipher algorithm, determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the output results of the round functions obtained by any round of iterative computation are results obtained by mapping the output results of the round functions obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

Optionally, the performing the linear transformation on the result of the nonlinear transformation to obtain the result of the linear transformation includes: performing a cyclic shift operation on the result of the nonlinear transformation to obtain a cyclic shift operation result; and executing the linear transformation on the cyclic shift operation result according to a linear preset matrix set by using the first instruction to obtain a linear transformation result, wherein the number of preset matrixes included in the preset matrix set is related to the cyclic shift operation result.

Optionally, the non-linear transformation includes a first affine transformation and a first inverse affine transformation, and the implementing the non-linear transformation on the input data of the non-linear transformation by using the instruction set in the second finite domain to obtain the result of the non-linear transformation includes:

performing the first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the mapping result of the input data is data that maps the input data into the second finite field according to an isomorphic matrix, the first data is data that maps a preset matrix into the second finite field according to an isomorphic matrix, the second data is data that maps a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relation; performing a cyclic shift operation on the first affine transformation result to obtain a cyclic shift operation result; performing the first inverse affine transformation on the cyclic shift operation result, third data and fourth data by using a second instruction in the instruction set, and obtaining a result of the nonlinear transformation, wherein the third data is data mapping a preset matrix set and a dot product result of the preset matrix into the second finite field, and the fourth data is data mapping the preset matrix set and the preset vector into the second finite field.

Optionally, the obtaining the ciphertext data according to the result of the nonlinear transformation includes: performing an exclusive-or operation on the result of the nonlinear transformation and an internal state of the first block cipher algorithm, the internal state of the first block cipher algorithm being associated with the input data of the nonlinear transformation, to obtain an output result of the round function; and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of output results of round functions obtained by the last round of iterative computation in the first block cipher algorithm, determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the output results of the round functions obtained by any round of iterative computation are results obtained by mapping the output results of the round functions obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

Optionally, the method further includes: and mapping the input data of the nonlinear transformation to the second finite field by using a first instruction in the instruction set and the isomorphic matrix to obtain a mapping result of the input data.

Optionally, the first packet cryptographic algorithm is a national cryptographic SM4 algorithm, and the second packet cryptographic algorithm is an advanced data encryption standard AES.

Optionally, the instruction set is a GFNI instruction set, the first instruction is a VGF2P8AFFINEQB instruction, and the second instruction is a VGF2P8AFFINEINVQB instruction.

Optionally, any one of the instructions included in the instruction set is implemented by using an application programming interface API in any one of the following languages: assembly language, C language, or C + + language.

Optionally, the instruction set includes an operand corresponding to any one instruction, which has a width equal to a bit number of a register associated with the operand corresponding to the any one instruction.

The embodiment of the application provides a method for realizing a block cipher algorithm, which comprises the following steps: acquiring ciphertext data to be decrypted; decrypting the decrypted data to be decrypted by using a first block cipher algorithm to obtain plaintext data, wherein calculation of a round function in the first block cipher algorithm is realized by using an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation; and outputting the plaintext data.

It is understood that the implementation principle of the decryption algorithm provided by the embodiment of the present application is the same as that of the encryption algorithm, and therefore, the contents not described in detail in this section may refer to the implementation flow of the encryption algorithm in the foregoing. For example, when the first packet cipher algorithm is the SM4 algorithm, the SM4 algorithm includes an encryption algorithm that operates on the same principle as a decryption algorithm except that the order of round keys used by the decryption algorithm and the order of round keys used by the encryption algorithm are in reverse order. For example, when implementing the SM4 encryption algorithm, the round keys utilized in 32 rounds of iterative computation of the round function are: (rk) ₀ ,rk ₁ ,......,rk ₃₁ ). So the round key (rk) in the above method is used ₀ ,rk ₁ ,……,rk ₃₁ ) Replacement by (rk) ₃₁ ,rk ₃₀ ,……,rk ₀ ) I.e. the SM4 decryption algorithm can be implemented.

The embodiment of the present application provides a device for implementing a block cipher algorithm, including: an acquisition unit configured to acquire plaintext data to be encrypted; the processing unit is used for encrypting the plaintext data to be encrypted by utilizing a first block cipher algorithm to obtain ciphertext data, wherein the calculation of a round function in the first block cipher algorithm is realized by utilizing an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation; and the output unit is used for outputting the ciphertext data.

Optionally, the processing unit is further configured to: obtaining input data of the nonlinear transformation, wherein the input data of the nonlinear transformation is determined according to a round key and the plaintext data, the input data of the nonlinear transformation is data in a first finite field, and the first finite field is a finite field of the first cipher block algorithm; in a second finite field, implementing the nonlinear transformation on the input data of the nonlinear transformation by using the instruction set to obtain a result of the nonlinear transformation, wherein the result of the nonlinear transformation is data in the second finite field, the second finite field is a finite field of a second grouping cryptographic algorithm, the second finite field has an isomorphic relationship with the first finite field, and the first grouping cryptographic algorithm is different from the second grouping cryptographic algorithm; obtaining the ciphertext data according to a result of the nonlinear transformation, the ciphertext data being data in the first finite field.

Optionally, the non-linear transformation includes a first affine transformation and a first inverse affine transformation, and the processing unit is further configured to: performing the first affine transformation on first data, second data and the input data of the non-linear transformation by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the first data is data for mapping a preset matrix into the second finite field according to an isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; and executing the first inverse affine transformation on the preset matrix, the preset vector and the first affine transformation result by using a second instruction in the instruction set to obtain a result of the nonlinear transformation.

Optionally, the non-linear transformation includes a first affine transformation, a second affine transformation, and an inverse transformation, and the processing unit is further configured to: performing a first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the mapping result of the input data is data that maps the input data into the second finite field according to an isomorphic matrix, the first data is data that maps a preset matrix into the second finite field according to an isomorphic matrix, the second data is data that maps a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; performing the inverse transformation on the first affine transformation result by using a second instruction in the instruction set to obtain an inverse transformation result; performing a second affine transformation on the first data, the second data, and the result of the inverse transformation using the first instruction, obtaining a result of the non-linear transformation.

Optionally, the round function further includes a linear transformation, and the processing unit is further configured to: performing the linear transformation on the result of the nonlinear transformation to obtain a result of the linear transformation; performing an exclusive-or operation on the result of the linear transformation and an internal state of the first block cipher algorithm, the internal state of the first block cipher algorithm being associated with the input data of the non-linear transformation, to obtain an output result of the round function; and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of output results of round functions obtained by the last round of iterative computation in the first block cipher algorithm, determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the output results of the round functions obtained by any round of iterative computation are results obtained by mapping the output results of the round functions obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

Optionally, the processing unit is further configured to: performing a cyclic shift operation on the result of the nonlinear transformation to obtain a cyclic shift operation result; and executing the linear transformation on the cyclic shift operation result according to a linear preset matrix set by using the first instruction to obtain a linear transformation result, wherein the number of preset matrixes included in the preset matrix set is related to the cyclic shift operation result.

Optionally, the non-linear transformation includes a first affine transformation and a first inverse affine transformation, and the processing unit is further configured to: performing the first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the mapping result of the input data is data that maps the input data into the second finite field according to an isomorphic matrix, the first data is data that maps a preset matrix into the second finite field according to an isomorphic matrix, and the second data is data that maps a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; performing a cyclic shift operation on the first affine transformation result to obtain a cyclic shift operation result; performing the first inverse affine transformation on the cyclic shift operation result, third data and fourth data by using a second instruction in the instruction set, and obtaining a result of the nonlinear transformation, wherein the third data is data mapping a preset matrix set and a dot product result of the preset matrix into the second finite field, and the fourth data is data mapping the preset matrix set and the preset vector into the second finite field.

Optionally, the processing unit is further configured to: performing an exclusive-or operation on the result of the nonlinear transformation and an internal state of the first block cipher algorithm, the internal state of the first block cipher algorithm being associated with the input data of the nonlinear transformation, to obtain an output result of the round function; and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of output results of round functions obtained by the last round of iterative computation in the first block cipher algorithm, determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the output results of the round functions obtained by any round of iterative computation are results obtained by mapping the output results of the round functions obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

Optionally, the processing unit is further configured to: and mapping the input data of the nonlinear transformation to the second finite field by using a first instruction in the instruction set and the isomorphic matrix to obtain a mapping result of the input data.

Optionally, any one of the instructions included in the instruction set is implemented by using an application programming interface API of any one of the following languages: assembly language, C language, or C + + language.

Optionally, the instruction set includes an operand corresponding to any instruction, which has a width equal to the number of bits of the register associated with the operand corresponding to the any instruction.

The embodiment of the application provides a device for realizing a block cipher algorithm, which comprises: an acquisition unit configured to acquire ciphertext data to be decrypted; the processing unit is used for carrying out decryption processing on the decrypted data to be decrypted by utilizing a first block cipher algorithm to obtain plaintext data, wherein the calculation of a round function in the first block cipher algorithm is realized by utilizing an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation; and the output unit is used for outputting the plaintext data.

Embodiments of the present application further provide a storage device, where the storage device stores program instructions executable by a processor to perform the method described above.

An embodiment of the present application further provides an electronic device, including: a processor; and a memory for storing a data processing program, the server executing the method as described above after being powered on and running the program through the processor.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments disclosed herein, nor do they necessarily limit the scope of the present disclosure. Other features disclosed in the present application will become apparent from the following description.

Compared with the prior art, the method has the following advantages:

the implementation method of the block cipher algorithm provided by the application comprises the following steps: acquiring plaintext data to be encrypted; encrypting plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, wherein calculation of a round function in the first block cipher algorithm is realized by using an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation; and outputting the ciphertext data. In the method, the first block cipher algorithm is realized by using the instruction set, the instruction set comprises the instructions for solving the nonlinear transformation, and thus, the nonlinear transformation is directly solved by using the instructions in the instruction set, and the operation efficiency can be improved.

Drawings

Fig. 1A is an application scenario of a method for implementing a block cipher algorithm according to an embodiment of the present application.

Fig. 1 is a schematic diagram of a method for implementing a block cipher algorithm according to an embodiment of the present application.

Fig. 2 is a schematic diagram of S120 in the method illustrated in fig. 1 described above.

Fig. 3 is a schematic diagram of one implementation of S220 in the method illustrated in fig. 2 above.

Fig. 4 is a schematic diagram of another implementation of S220 in the method illustrated in fig. 2.

Fig. 5 is a schematic diagram of still another implementation of S220 in the method illustrated in fig. 2.

Fig. 6 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application.

Fig. 7 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application.

Fig. 8 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application.

Fig. 9 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application.

Fig. 10A is a schematic diagram of a method for implementing an SM4 encryption algorithm according to an embodiment of the present application.

Fig. 10 is a block diagram of an apparatus for implementing a block cipher algorithm according to an embodiment of the present application.

Fig. 11 is a structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

With the rapid development of internet technology, more and more devices (e.g., user terminals or servers) communicate through a network to realize data transmission. In order to improve data security and protect privacy, when data is transmitted between devices through a network, a sending end device usually encrypts the data to be transmitted according to a predetermined algorithm to obtain ciphertext data, and then transmits the ciphertext data to a destination end device through the network. And after receiving the ciphertext data, the destination terminal equipment decrypts the ciphertext data according to the corresponding decryption algorithm to obtain plaintext data.

The block cipher algorithm is a data encryption and decryption algorithm which is commonly used at present. A block cipher algorithm is a type of cipher algorithm that can only process a block of data of a particular length at a time. For example, the block cipher algorithm includes an SM4 algorithm, a Data Encryption Algorithm (DEA), or an Advanced Encryption Standard (AES).

In the conventional technology, the block cipher algorithm may be implemented based on a software method, but in such an implementation, a large number of operations and a large number of corresponding instructions need to be executed, which results in that more system overhead needs to be consumed and the operation efficiency is low when the block cipher algorithm is implemented based on the software method in the conventional technology. In some software implementations, for example, when implementing a block cipher algorithm based on the bitsolic technology, 128 or 256 plaintext data blocks need to be packed together before algorithm processing, which is poor in generality and practicability. In addition, the block cipher algorithm can also be implemented based on hardware, but this implementation requires special hardware to be installed in the device to implement the block cipher algorithm, which causes a problem of poor generality when implementing the block cipher algorithm based on hardware.

Based on this, the present application provides a method, an apparatus, and a device for implementing a block cipher algorithm, so as to solve the above problems in the conventional technology.

First, technical terms related to embodiments of the present application are briefly described:

1. SM4 algorithm

The SM4 algorithm is the current block cipher standard in China, issued by the China national cipher administration at 21/3/2012, and the relevant standard is GM/T0002-2012 SM4 block cipher Algorithm. The algorithm packet length is 128bits, the key length is 128bits, and both the encryption algorithm and the key expansion algorithm adopt 32-round nonlinear iteration structures.

The SM4 algorithm consists of an encryption-decryption algorithm and a key expansion algorithm. The packet length and key length of the SM4 algorithm are both 128 bits. The iteration rounds of the encryption algorithm and the key expansion algorithm are 32 rounds. The algorithm of the encryption and decryption process is the same but the order of use of the round keys is reversed. And (3) a key expansion algorithm: the SM4 algorithm uses 128-bit encryption keys and employs a 32-round iterative encryption structure, using one 32-bit round key for each round of encryption, for a total of 32 round keys. It is therefore necessary to generate 32 round keys from the encryption key using a key expansion algorithm.

For better understanding of the implementation of the block cipher algorithm provided in the present application, the encryption/decryption algorithm and the key expansion algorithm of the SM4 algorithm are briefly described below. The SM4 algorithm is a method using a 32-round nonlinear iterative structure with words (b), (c), (d), and (d)32 bits) as a unit, and each iteration operation is the calculation of a round function F. Illustratively, when the SM4 algorithm inputs plaintext data as (X) _i ,X _i+1 ,X _i+2 ,X _i+3 )，X _i ,X _i+1 ,X _i+2 And X _i+3 When all are 4 words, then the encryption algorithm in the SM4 algorithm can be represented by the following formula:

wherein the function F () represents a round function; rk _i Representing a round key; the function R () represents an inverse transform operation; (X) _i ,X _i+1 ,X _i+2 ,X _i+3 ) Input data representing the ith round of iterative round function, the input data being 4 words and 128bits in total; (X) _i+1 ,X _i+2 ,X _i+3 ,X _i+4 ) Output data representing an ith round of iterative round function; (Y) ₀ ,Y ₁ ,Y ₂ ,Y ₃ ) And the output data of the round function after 32 th iteration is executed, namely the ciphertext data which is 4 words and has 128 bits. It is understood that the function T () in the above formula (1) is a function of a reversible transform, and the function T () is a complex of a nonlinear transform and a linear transform, i.e., T () ═ L (r ()), and an output of the nonlinear transform is an input of the linear transform.

Nonlinear transformation: comprising four parallel S-boxes, with nonlinear transformation input a ═ a _i ,a _i+1 ,a _i+2 ,a _i+3 ) The nonlinear transformation output is B ═ B _i ,b _i+1 ,b _i+2 ,b _i+3 ) Wherein any one of data of A and B is 8 bits. For example, a _i Is 8bit, b _i Is 8 bits.

Linear transformation: the input of the linear transformation is the output of the nonlinear transformation, and the input and the output of the linear transformation are both 32 bits. Wherein the linear transformation can be represented by the following formula:

it will be appreciated that the principle of the decryption algorithm and the principle of the encryption algorithm of the SM4 algorithm are identical, with the difference that the order of the round keys used by the decryption algorithm and the order of the round keys used by the encryption algorithm are in reverse order. For example, the order of round keys used by the encryption algorithm when performing 32 rounds of iterative computations is: (rk) ₀ ,rk ₁ ,……,rk ₃₁ ) Then the order of the round keys used by the decryption algorithm when performing 32 rounds of iterative computations is: (rk) ₃₁ ,rk ₃₀ ,……,rk ₀ )。

The key expansion method comprises the following steps: assume that the encryption key is MK ═ MK (MK) ₀ ,MK ₁ ,MK ₂ ,MK ₃ ) System parameter FK ═ FK ₀ ,FK ₁ ,FK ₂ ,FK ₃ ) The fixed parameter is CK ═ CK ₀ ,CK ₁ ,......,CK ₃₁ )。rk _i To be a round key, the round key is generated from an encryption key.

The initial round key may be represented by the following formula:

round key rk _i Can be expressed by the following formula:

where the function T ' () ═ L ' (r ()), the linear transformation L ' () can be expressed by the following equation:

it is to be understood that the function T '() shown by the formula (5) in the above-described key expansion method is to replace the linear transformation L () in the function T () in the above-described encryption and decryption method with L' ().

2. SM4 finite field

The SM4 finite field is used for defining a finite field of a wheel function in the SM4 algorithm, each element in the finite field is 8bits in length and comprises 2 ⁸ The elements on the finite field may be represented in bytes (bytes). The SM4 finite field is also known as GF _SM4 (2 ⁸ ) Finite fields, i.e. SM4 finite field with GF _SM4 (2 ⁸ ) The finite field expresses the same meaning.

3. SM 4S box

The SM 4S box may also be referred to as the SM 4S box transform, S box, or S box transform. The S-box is one of the round functions that the SM4 algorithm includes. The input and output of the S-box are both 8-bit elements, and the specific definition can be referred to as "nonlinear transformation" in the GM/T0002-2012 "SM 4 block cipher Algorithm" standard. The S-box may be regarded as a superposition of a series of operations on the SM4 finite field, and specifically may include the following: carrying out affine transformation on input data once; inverting the result of the affine transformation; another affine transformation is performed on the result of the inversion.

4. Layer SM 4L

The SM 4L layer may in turn become a SM 4L layer transform or an L layer transform. The L layer is another operation in the round function included in the SM4 algorithm, the input of the L layer includes the output of the SM 4S box, and the input and the output of the L layer are both 32-bit elements, and the specific definition can be referred to as "linear transformation" in the GM/T0002-2012 "SM 4 block cipher algorithm" standard. The L-layer operation can be viewed as a linear transformation acting on a 32-bit word, or the L-layer operation can be split into a combination of shifts and linear transformations acting on each byte.

5. Advanced Encryption Standard (AES) algorithm

The AES algorithm is an advanced encryption standard issued in NIST2001, 11 months, and is a current cryptographic algorithm standard of the U.S. Federal government.

6. AES finite field

The AES finite field is used for defining a finite field of a wheel function in an AES algorithm, each element in the finite field is 8bits in length and comprises 2 ⁸ The elements on the finite field may be represented in bytes (bytes). AES finite field is also called GF _AES (2 ⁸ ) Finite fields, i.e. AES finite fields and GF _AES (2 ⁸ ) The finite field expresses the same meaning.

7. Isomorphism

Isomorphism means that two finite fields have the same structure, i.e., an element on one finite field can be transformed into an element on the other finite field, and an operation on the element on the one finite field can be transformed into an operation on the other finite field. Illustratively, the SM4 finite field and the AES finite field have a homogeneous relationship, i.e., an operation on element a on the SM4 finite field may be converted to an operation on element B on the AES finite field.

8. Linear transformation

The linear transformation is an operation of treating an element on a finite field as an 8-bit row vector consisting of 0 and 1, and multiplying the row vector by an 8 × 8 binary matrix to obtain another 8-bit row vector.

It is understood that the isomorphic relationship between finite field a and finite field B can be represented as a linear transformation, where elements on a are transformed into elements on B, and elements on B are transformed into elements on a by inverse transformation.

9. Affine transformations

Affine transformation is an operation of first performing a linear transformation on an element in a finite field, and then adding a constant element. An affine transformation on one finite field can be mapped to an affine transformation on another finite field by isomorphic relationships (i.e., linear transformations).

10. Inverse affine transformation

The inverse affine transformation is an operation of inverting the elements in the finite field and then performing affine transformation.

11. GFNI instruction set

The GFNI instruction set is a CPU instruction set proposed by Intel corporation that can be used to accelerate operations over AES finite fields. The GFNI instruction set includes an instruction to perform affine transformation in the AES finite field, i.e., the VGF2P8AFFINEQB instruction, and the GFNI instruction set also includes an instruction to perform inverse affine transformation in the AES finite field, i.e., the VGF2P8AFFINEINVQB instruction.

12. VGF2P8AFFINeQB instruction

The effects of the VGF2P8AFFINeQB instruction include: given an 8 x 8 binary matrix A and a length 8 binary vector C, in AES (i.e., GF) _AES (2 ⁸ ) X · a + C is calculated over a finite field for a binary vector x of length 8. The VGF2P8AFFINeQB instruction supports 128-bit, 256-bit, or 512-bit width registers. The VGF2P8AFFINeQB instruction may be invoked using assembly language. Optionally, the VGF2P8AFFINEQB instruction may also be called by C language or C + + language using the intrinsics interface.

In some implementations, when the VGF2P8AFFINEQB instruction is called using assembly language, the API of the assembly language can be expressed as:

VGF2P8 AFFINENVQB (parameter 1, parameter 2, parameter 3, parameter 4)

Where, the parameter 1 represents a register storing x (8 bits), the register may be any one of the following registers: a 128-bit register (i.e., xmm), a 256-bit register (i.e., ymm), or a 512-bit register (i.e., zmm). Specifically, when the register represented by the parameter 1 is a 128-bit register, 16 x can be operated simultaneously by executing the VGF2P8AFFINEINVQB instruction once; when the register represented by the parameter 1 is a 256-bit register, 32 x can be operated by executing the VGF2P8AFFINEINVQB instruction once; when the register represented by parameter 1 is a 512-bit register, 64 x can be operated on simultaneously by executing the VGF2P8 AFFINENVQB instruction once. Parameter 2 represents a register storing matrix a (64 bits). Specifically, when the register represented by the parameter 2 is a 128-bit register, 2 a can be operated simultaneously by executing the VGF2P8AFFINEINVQB instruction once; when the register represented by the parameter 2 is a 256-bit register, 4 a's can be operated simultaneously by executing the VGF2P8AFFINEINVQB instruction once; when the register represented by parameter 2 is a 512-bit register, the VGF2P8 AFFINENVQB instruction executed once can operate on 8A's simultaneously. Parameter 3 represents a register storing the x · a + C result. Wherein the order of positions of the results corresponds to the order of positions of x in the first parameter. Parameter 4 is an immediate 8-bit long number representing the binary vector C involved in the calculation.

In other implementations, when the VGF2P8AFFINEQB instruction is called using C language or C + + language, the API of C or C + + can be expressed as:

__ m128i _ mm _ gf2p8affine _ epi64_ epi8 (parameter #1, parameter #2, parameter #3)

Where parameter #1 indicates a register storing x, parameter #2 indicates a register storing a, parameter #3 indicates a register storing C, and __ m128i _ mm _ gf2p8affine _ epi64_ epi8 returns x · a + C. The parameter #1 and the parameter #2 may be any one of the following registers: a 128-bit register, a 256-bit register, or a 512-bit register.

13. VGF2P8 AFFINENVQB instruction

The effects of the VGF2P8 AFFINENVQB instruction include: given an 8 x 8 binary matrix A and a length 8 binary vector C, in AES (i.e., GF) _AES (2 ⁸ ) Computing x over a finite field for a binary vector x of length 8 ^-1 A + C. The VGF2P8AFFINeQB instruction supports 128, 256, or 512bit width registers. The VGF2P8 AFFINENVQB instruction may be called using assembly language. Optionally, the VGF2P8AFFINEINVQB instruction may be called by C language or C + + language using the intrinsics interface.

It will be appreciated that the effect of the VGF2P8 AFFINENVQB instruction differs from the effect of the VGF2P8 AFFINENVQB instruction in that the VGF2P8 AFFINENVQB instruction is used to calculate x ^-1 The VGF2P8AFFINeQB instruction is used to calculate x.A + C, and the detailed description of the VGF2P8 AFFINeVQB instruction can be found in the above description of the VGF2P8AFFINeQB instruction, where x in the VGF2P8AFFINeQB instruction needs to be replaced by x ^-1 And (4) finishing.

14. Single Instruction Multiple Data (SIMD) streams

SIMD is a technique that uses one controller to control a plurality of processors, and simultaneously performs the same operation on each of a set of data (also referred to as "data vectors") to achieve spatial parallelism. For example, the AES algorithm may be accelerated using SIMD technology. By storing a plurality of plaintext data blocks into a long register (for example, a register with a length of 128bits, 256bits or 512 bits) at the same time, the acceleration effect of parallel processing of the plurality of plaintext blocks is achieved through one operation on the register, and the performance of the AES algorithm is greatly improved.

An application scenario to which the method for implementing the block cipher algorithm provided by the present application is applicable, and a method, an apparatus, and a device for implementing the block cipher algorithm provided by the present application are described in detail below with reference to the accompanying drawings. It is to be understood that the embodiments and features of the embodiments described below may be combined with each other without conflict between the embodiments provided in the present application. In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

First, an application scenario applicable to a method for implementing a block cipher algorithm provided in an embodiment of the present application is described with reference to the accompanying drawings.

Fig. 1A is an application scenario of a method for implementing a block cipher algorithm according to an embodiment of the present application. Specifically, the application scenario shown in fig. 1A includes at least one server 10 and at least one terminal 20.

The server 10 may be a server having both storage and computing capabilities, the server 10 having a memory and a processor. The processor of the server 10 may be a Central Processing Unit (CPU). Optionally, the server 10 may further have one or more of a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), and a Field Programmable Gate Array (FPGA), and the memory of the server 10 may be a Random Access Memory (RAM) or a solid-state drive (SSD), and other devices or memory instances with storage capability. A solid-state drive may also be referred to as a solid-state drive (SSD).

The terminal 20 may be a terminal having both storage and computing capabilities. One or more applications may be installed on the terminal 20, and application data associated with the applications may be generated when the applications are run on the terminal 20. The applications may include chat applications, financial services applications, gaming applications, video applications (e.g., video live applications or video conferencing applications, etc.). For example, the terminal 20 may be a personal computer, a smart phone, a tablet computer, or the like.

In one example, the server 10 may be a physical device deployed in a network. The server 10 and the terminal 20 are connected in communication through a network to realize data transmission. In some implementations, after obtaining the plaintext data to be transmitted from the local storage device, the server 10 encrypts the plaintext data by using an encryption algorithm to obtain ciphertext data, and then transmits the ciphertext data to the terminal 20 through the network. After receiving the ciphertext data, the terminal 20 decrypts the ciphertext data by using a decryption algorithm corresponding to the encryption algorithm to obtain plaintext data. In other implementations, after obtaining the plaintext data to be transmitted from the local storage device, the terminal 20 encrypts the plaintext data by using an encryption algorithm to obtain ciphertext data, and then transmits the ciphertext data to the server 10 through the network. After receiving the ciphertext data, the server 10 decrypts the ciphertext data by using a decryption algorithm corresponding to the encryption algorithm to obtain plaintext data.

It should be understood that the application scenario shown in fig. 1A is only an illustration and does not constitute any limitation to the application scenario to which the method implemented by the block cipher algorithm provided in the embodiment of the present application is applicable. Optionally, the application scenario may further include a greater number of servers 10 and a greater number of terminals 20. Alternatively, the terminal 20 may be replaced with the server 10.

The following describes a method for implementing a block cipher algorithm provided in an embodiment of the present application with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a method for implementing a block cipher algorithm according to an embodiment of the present application. The implementation method of the block cipher algorithm provided by the embodiment of the application can be executed by a device implemented by the block cipher algorithm. It is understood that the apparatus may be implemented as software, or a combination of software and hardware. For example, the apparatus in the embodiment of the present application may be, but is not limited to, a server or a terminal device used by a user. As shown in fig. 1, the implementation method of the block cipher algorithm provided in the embodiment of the present application includes S110 to S130. Next, details of S110 to S130 will be described.

S110, plaintext data to be encrypted is obtained.

Data length of plaintext data to be encryptedAnd the type of data are not particularly limited. For example, the plaintext data to be encrypted may be, but is not limited to, 128-bit plaintext data, and at this time, the 128-bit plaintext data may be further divided into 4 groups, which are denoted as (X) ₀ ,X ₁ ,X ₂ ,X ₃ )，X _k (k is 0,1,2,3) is 32-bit data. Optionally, the 128-bit plaintext data may be further divided into 8 groups, i.e. (X) ₀ ,X ₁ ,X ₂ ,X ₃ ,X ₄ ,X ₅ ,X ₆ ,X ₇ )，X _k (k is 0,1,2,3, 4,5,6,7) is data of 16 bits. As another example, the plaintext data to be encrypted may be, but is not limited to, data stored in a cloud server or a user terminal device (e.g., a mobile phone or a tablet computer).

The manner of obtaining the plaintext data to be encrypted is not particularly limited. For example, when the server performs S110 described above, it may be that the server sends a request to the device storing the plaintext data to acquire the plaintext data. As another example, the device storing the plaintext data may also actively send the plaintext data to the server.

And S120, encrypting the plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, wherein the calculation of a round function in the first block cipher algorithm is realized by using an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation.

In the embodiment of the present application, a first block cipher algorithm is used to encrypt plaintext data to be encrypted to obtain ciphertext data, wherein an instruction set is used to calculate a round function in the first block cipher algorithm, and the method includes the following steps: obtaining input data of a nonlinear transformation, wherein the input data of the nonlinear transformation is determined according to a round key and plaintext data, the input data of the nonlinear transformation is data in a first finite field, and the first finite field is a finite field of a first cipher block algorithm; in a second finite field, a nonlinear transformation is performed on the input data of the nonlinear transformation using the instruction set to obtain a result of the nonlinear transformation, wherein the result of the nonlinear transformation is data in the second finite field, and the second finite field is a second groupingThe second finite field and the first finite field have isomorphic relation, and the first grouping cryptographic algorithm is different from the second grouping cryptographic algorithm; ciphertext data is obtained from a result of the nonlinear transformation, the ciphertext data being data in a first finite field. Fig. 2 shows a schematic flow chart of the above "performing encryption processing on plaintext data to be encrypted by using the first block cipher algorithm to obtain ciphertext data". As shown in fig. 2, the flow includes S210 to S230. In one example, when the first packet cipher algorithm is the SM4 algorithm, 32 rounds of iterative computations need to be performed when executing the SM4 algorithm. In this implementation, the input data of the non-linear transformation is determined from the round key and the plaintext data, and includes: when a first round of iterative computation is executed, input data of the nonlinear transformation is determined according to a round key and plaintext data corresponding to the first round of iteration; alternatively, when any round of iterative computation after the first round of iterative computation is performed, the input data of the nonlinear transformation is determined according to the corresponding round key of the any round of iterative computation and the internal state of the SM4 algorithm in the any round of iterative computation. The internal state of the SM4 algorithm in any round of iterative computation is the internal state of the SM4 algorithm corresponding to the previous round of iterative computation before the any round of iterative computation. When the current iteration is the first iteration, the internal state of the SM4 algorithm is the plaintext data input to the SM4 algorithm. For example, the following description will take the example of obtaining the input data of the nonlinear transformation for the 1 st round of iterative computation. Specifically, the plaintext data input to the SM4 algorithm in the 1 st iteration is (X) ₀ ,X ₁ ,X ₂ ,X ₃ )，X _k When (k is 0,1,2,3) is 32-bit data, the corresponding round key rk is calculated by the 1 st iteration ₀ To X ₁ ，X ₂ And X ₃ And performing XOR to obtain 32-bit data, and splitting the 32-bit data into 4 pieces of 8-bit data to obtain input data of nonlinear transformation corresponding to the 1 st round of iterative computation. Wherein a round key rk is utilized ₀ To X ₁ ，X ₂ And X ₃ The exclusive OR can be expressed as

In this implementation, (X) ₀ ,X ₁ ,X ₂ ,X ₃ ) Also referred to as the internal state of the SM4 algorithm at round 1 iteration. For example, the following description will take the example of obtaining input data of the nonlinear transformation for the 2 nd round of iterative computation. Specifically, the internal state of the SM4 algorithm in the 2 nd round of iterative computation is (X) ₁ ,X ₂ ,X ₃ ，X ₄ )，X ₄ For the output result of the 1 st iteration calculation, the 2 nd iteration calculation is used for calculating the corresponding round key rk ₁ To X ₂ ，X ₃ And X ₄ And performing XOR to obtain 32-bit data, and splitting the 32-bit data into 4 pieces of 8-bit data to obtain input data of nonlinear transformation corresponding to the iteration calculation of the 2 nd round. Wherein a round key rk is utilized ₁ To X ₂ ，X ₃ And X ₄ The XOR can be expressed as

By analogy, the input data of the nonlinear transformation calculated in iteration round 3, … …, and the input data of the nonlinear transformation calculated in iteration round 32 can be obtained.

The round key is determined based on the encryption key, and the manner of obtaining the round key is not particularly limited. For example, the round key may be obtained from the encryption key in the manner specified in the SM4 algorithm standard. It will be appreciated that the principle of the decryption algorithm and the principle of the encryption algorithm of the SM4 algorithm are identical, with the difference that the order of the round keys used by the decryption algorithm and the order of the round keys used by the encryption algorithm are in reverse order. For example, the order of round keys used by the encryption algorithm when performing 32 rounds of iterative computations is: (rk) ₀ ,rk ₁ ,……,rk ₃₁ ) Then the order of the round keys used by the decryption algorithm when performing 32 rounds of iterative computations is: (rk) ₃₁ ,rk ₃₀ ,……,rk ₀ )。

The second finite field has a homogenous relationship with the first finite field, that is, operations on data on the second finite field may be converted to operations on data on the first finite field, or operations on data on the first finite field may be converted to operations on data on the first finite fieldOperation of data over a second finite field. For convenience of description, in the embodiments of the present application, data on a first finite field may be converted to data on a second finite field using a matrix M (also referred to as an isomorphic matrix). Using matrix M ^-1 (also known as the inverse of a homogenous matrix) may convert data on the second finite field to data on the first finite field. For example, when the first packet encryption algorithm is the SM4 algorithm and the second packet encryption algorithm is the AES algorithm, the matrix M and the matrix M ^-1 The definition of (a) can be as follows:

the embodiment of the present application provides three ways to realize that "in the second finite field, the nonlinear transformation is implemented on the input data of the nonlinear transformation by using the instruction set, and the result of the nonlinear transformation is obtained". The three implementations are described in detail below with reference to the drawings.

The implementation mode is as follows:

in the first implementation manner, the nonlinear transformation includes a first affine transformation and a first inverse affine transformation, and in the second finite domain, the nonlinear transformation is implemented by an instruction set, and a result of the nonlinear transformation is obtained, including: performing a first affine transformation on first data, second data and input data of the nonlinear transformation by using a first instruction in an instruction set to obtain a first affine transformation result, wherein the first data is data for mapping a preset matrix into a second finite field according to an isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating an isomorphic relation; and executing first inverse affine transformation on the preset matrix, the preset vector and the first affine transformation result by using a second instruction in the instruction set to obtain a nonlinear transformation result. Fig. 3 shows a flow diagram for obtaining the result of the non-linear transformation using the above-described implementation. As shown in fig. 3, the flow includes S310 and S320.

In this implementation, the first data is data that maps the preset matrix to the second finite field according to the homogeneous matrix, that is, the first data is specifically a result of dot multiplication of an inverse matrix of the homogeneous matrix, and the preset matrix. The predetermined matrix is a predetermined 8 × 8 binary matrix. The second data is data mapping the predetermined vector into the second finite field according to the isomorphic matrix, i.e. the second data is specifically a result of a point multiplication of the predetermined vector and the isomorphic matrix. The predetermined vector may be a predetermined 1 × 8 binary row vector. The isomorphic matrix is used to indicate an isomorphic relationship, and in particular, the isomorphic matrix is a mapping matrix for converting data in the first finite field to data in the second finite field while maintaining isomorphic properties, for example, the matrix M shown in equation (6) above.

In some possible implementations, when the first packet cipher algorithm is the SM4 algorithm and the second packet cipher algorithm is the AES algorithm, the non-linear transformation in the round function may be understood as an S-box specified in the SM4 algorithm standard, which may be represented by the following formula:

SBox(x)＝A _SM4 (x·A _SM4 +C _SM4 ) ^-1 +C _SM4 (7)

wherein, A _SM4 Is a predetermined matrix, C _SM4 Is a predetermined vector, x is the input data of the S-box, and x is an 8-bit binary data. Presetting a matrix A _SM4 And a predetermined vector C _SM4 The definitions of (A) are as follows:

also, taking the case where the nonlinear transformation is an S-box in the SM4 algorithm (i.e., the content shown in the above equation (7)) as an example, the first data a in the first implementation mode is _SM4 ' and second data C _SM4 ' can be expressed by the following formulas, respectively:

A _SM4 '＝M·A _SM4 ·M ^-1 (9)

wherein, A _SM4 ' denotes a matrix obtained by converting the matrix in the SM4 finite field to the AES finite field.

C _SM4 '＝M·C _SM4 (10)

Wherein, C _SM4 ' Preset vector C in finite Domain of SM4 _SM4 Conversion to the matrix obtained in the AES finite field.

For example, fig. 6 shows an embodiment in which the implementation of the round function using the instruction set includes a non-linear transformation in the second implementation manner. Specifically, in fig. 6, the nonlinear transformation is described as x, the first block cipher algorithm is SM4 algorithm, and the second block cipher algorithm is AES algorithm, where x is binary data of 8 bits. Alternatively, when the input data of the nonlinear transformation includes a plurality of x, the same operation may be performed for each x. See the description of fig. 6 below for details, which are not repeated here.

Optionally, in the first implementation, the round function further includes a linear transformation. That is, after the first implementation is performed, obtaining ciphertext data according to a result of the nonlinear transformation includes: performing linear transformation on the result of the nonlinear transformation to obtain a result of the linear transformation; performing an exclusive-or operation on the result of the linear transformation and the internal state of the first block cipher algorithm to obtain an output result of the round function, the internal state of the first block cipher algorithm being associated with the input data of the nonlinear transformation; and when the round key is the key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on the mapping result of the output result of the round function obtained by the last round of iterative computation in the first block cipher algorithm, determining the result obtained by the reverse order arrangement as cipher text data, wherein the mapping result of the output result of the round function obtained by any round of iterative computation is the result of mapping the output result of the round function obtained by any round of iterative computation to a first finite field by using an isomorphic matrix. Wherein, performing linear transformation on the result of the nonlinear transformation to obtain a result of the linear transformation comprises: performing cyclic shift operation on the result of the nonlinear transformation to obtain a cyclic shift operation result; and performing linear transformation on the cyclic shift operation result according to a preset matrix set by using the first instruction to obtain a linear transformation result, wherein the number of preset matrixes included in the preset matrix set is related to the cyclic shift operation result. That is, this implementationWherein a linear transformation operation is performed using a first instruction in the instruction set. For example, when the first packet cipher algorithm is the SM4 algorithm, the SM4 algorithm includes 32 iterations, and the internal state of the SM4 algorithm at the 1 st iteration is (X) ₀ ,X ₁ ,X ₂ ,X ₃ ). When the 1 st round of iterative computation is executed, performing exclusive or operation on the result of the nonlinear transformation and the internal state of the first block cipher algorithm to obtain an output result of a round function corresponding to the 1 st round of iterative computation, including: to X ₀ And the result of the nonlinear transformation obtained by the 1 st iteration of calculation performs an exclusive-or operation. In the 2 nd iteration, the internal state of the SM4 algorithm is from (X) ₀ ,X ₁ ,X ₂ ,X ₃ ) Is changed to (X) ₁ ,X ₂ ,X ₃ ，X ₄ )，X ₄ The output result of the 1 st iteration of the SM4 algorithm. When the 2 nd round of iterative computation is executed, performing exclusive or operation on the result of the nonlinear transformation and the internal state of the first block cipher algorithm to obtain an output result of a round function corresponding to the 2 nd round of iterative computation, wherein the output result comprises: to X ₁ And performing exclusive-or operation on the result of the nonlinear transformation obtained by the 2 nd iteration. By analogy, the output result of the round function corresponding to the 3 rd round of iterative computation can be obtained, … …, and the output result of the round function corresponding to the 32 th round of iterative computation can be obtained.

Optionally, in other implementations, the linear transformation operation may not be performed using the first instruction in the instruction set. For example, a dot product operation may be performed on the linear preset matrix set and the result of the cyclic shift operation by using a mathematical calculation to obtain a result of the linear transformation.

In some possible implementations, when the first packet cipher algorithm is the SM4 algorithm and the second packet cipher algorithm is the AES algorithm, the linear transformation in the round function described above may be understood as the L layer specified in the SM4 algorithm standard. Based on this, the cyclic shift operation is executed on the result of the nonlinear transformation, and a cyclic shift operation result is obtained, which includes: and performing cyclic right shift on the result of the nonlinear transformation by 8bits, 16 bits and 24 bits respectively to obtain corresponding cyclic shift operation results, wherein 3 cyclic shift operation results can be obtained. Based on this, the number of preset matrices included in the preset matrix set is associated with the result of the cyclic shift operation, it can be understood that the number of preset matrices included in the preset matrix set is equal to 1 plus 3, and "3" represents 3 results obtained by performing the cyclic shift operation, that is, the preset matrix set in this implementation includes 4 preset matrices. The 4 preset matrixes included in the preset matrix set are all binary matrixes of 8 × 8. Illustratively, the L-layer transformation in the SM4 algorithm can be represented by the following formula:

alternatively, the above formula (11) may also be expressed by the following formula:

in the above formulas (11) and (12), (x) ₀ ,x ₁ ,x ₂ ,x ₃ ) Input data representing an L-layer transform; l is ₀ ,、L ₁ 、L ₂ And L ₃ Is a matrix included in the preset matrix set. In some possible implementations, the definition of the 4 preset matrices is as follows:

for example, fig. 7 shows an embodiment of implementing the round function (including the non-linear transformation and the linear transformation) by using an instruction set in the first implementation manner. Specifically, the input data in fig. 7, which is subjected to the nonlinear transformation, is (x) _i ,x _i+1 ,x _i+2 ,x _i+3 ) I-0, 1...... 31, the current iteration is the ith iteration, the first packet cipher algorithm is SM4 algorithm, and the second packet cipher algorithm is AES algorithm. See the description of fig. 7 below for details, which will not be described in detail here.

The implementation mode two is as follows:

in a second implementation, the nonlinear transformation includes a first affine transformation, a second affine transformation, and an inverse transformation, and in a second finite domain, the nonlinear transformation is implemented by an instruction set, and a result of the nonlinear transformation is obtained, including: performing a first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in an instruction set to obtain a first affine transformation result, wherein the mapping result of the input data is data for mapping the input data into a second finite field according to an isomorphic matrix, the first data is data for mapping a preset matrix into the second finite field according to the isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating an isomorphic relationship; performing inverse transformation on the first affine transformation result by using a second instruction in the instruction set to obtain an inverse transformation result; and performing a second affine transformation on the first data, the second data and the result of the inverse transformation by using the first instruction to obtain a result of the nonlinear transformation. Fig. 4 shows a flow chart of obtaining the result of the nonlinear transformation using the second implementation. As shown in fig. 4, the flow includes S410 to S430.

Optionally, the first instruction in the instruction set and the isomorphic matrix may be further used to map the input data of the nonlinear transformation into the second finite field, so as to obtain a mapping result of the input data.

It can be understood that, the definitions of the isomorphic vector, the preset matrix, the first data and the second data in the second implementation are respectively the same as those in the first implementation, and details that are not described herein in detail may specifically refer to the relevant description in the first implementation.

Optionally, in the second implementation, the round function further includes a linear transformation. That is to say, after the second implementation manner is executed, the method of "obtaining ciphertext data according to the result of the nonlinear transformation" described in the first implementation manner may also be executed, and details that are not described in detail herein may refer to relevant contents in the first implementation manner.

It is to be understood that, in the second implementation manner, when the first packet cipher algorithm is the SM4 algorithm, in the second implementation manner, the nonlinear transformation in the round function may be understood as an S-box specified in the SM4 algorithm standard, and the linear transformation in the round function may be understood as an L-layer specified in the SM4 algorithm standard.

By way of example, fig. 8 below illustrates an embodiment of implementing the above-described non-linear transformation (including the non-linear transformation including the first affine transformation and the first inverse affine transformation) using an instruction set in the above-described implementation one. Specifically, in fig. 8, the nonlinear transformation is described as x, the first block cipher algorithm is SM4 algorithm, and the second block cipher algorithm is AES algorithm, where x is binary data of 8 bits. Alternatively, when the input data of the nonlinear transformation includes a plurality of x, the same operation may be performed for each x. See the description of fig. 8 below for details, which are not repeated here.

The implementation mode is three:

in an implementation mode three, the nonlinear transformation includes a first affine transformation and a first inverse affine transformation, and in the second finite domain, the nonlinear transformation is implemented on the input data of the nonlinear transformation by using the instruction set, and a result of the nonlinear transformation is obtained, including: performing a first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in an instruction set to obtain a first affine transformation result, wherein the mapping result of the input data is data for mapping the input data into a second finite field according to an isomorphic matrix, the first data is data for mapping a preset matrix into the second finite field according to the isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating an isomorphic relationship; executing cyclic shift operation on the first affine transformation result to obtain a cyclic shift operation result; and performing first inverse affine transformation on the cyclic shift operation result, third data and fourth data by using a second instruction in the instruction set to obtain a result of nonlinear transformation, wherein the third data is data obtained by mapping a dot product result of the preset matrix set and the preset matrix into a second finite field, and the fourth data is data obtained by mapping the preset matrix set and the preset vector into the second finite field. Fig. 5 shows a flow diagram for obtaining the result of the non-linear transformation using the third implementation. As shown in fig. 5, the flow includes S510 to S540.

It can be understood that, the definitions of the preset matrix set, the isomorphic vector, the preset matrix, the first data and the second data in the second implementation are respectively the same as those in the first implementation, and details that are not described herein in detail may specifically refer to relevant descriptions in the first implementation.

Optionally, in another possible implementation manner, on the basis of the third implementation manner, obtaining ciphertext data according to a result of the nonlinear transformation includes: performing an exclusive-or operation on the result of the nonlinear transformation and the internal state of the first block cipher algorithm to obtain an output result of the round function, the internal state of the first block cipher algorithm being associated with the input data of the nonlinear transformation; and when the round key is the key used by the last round of iterative computation in the first grouped encryption algorithm, performing reverse order arrangement on the mapping result of the output result of the round function obtained by the last four rounds of iterative computation in the first grouped encryption algorithm, determining the result obtained by the reverse order arrangement as ciphertext data, wherein the mapping result of the output result of the round function obtained by any round of iterative computation is the result of mapping the output result of the round function obtained by any round of iterative computation to the first finite field by using the isomorphic matrix. The implementation method of "performing an exclusive or operation on the result of the nonlinear transformation and the internal state of the first block cipher algorithm to obtain the output result of the round function" is the same as the implementation method described in the first implementation manner, and details that are not described herein may be referred to in the related description of the first implementation manner.

For example, fig. 9 shows an embodiment of implementing the round function (including the non-linear transformation and the linear transformation) by using an instruction set in the third implementation manner. Specifically, the input data in fig. 9 is (x) in the nonlinear transformation _i ,x _i+1 ,x _i+2 ,x _i+3 ) I-0, 1...... 31, the current iteration is the ith iteration, and the first block cipher algorithm is calculated for SM4The method and the second grouping cipher algorithm are introduced as an example of an AES algorithm. See the description of fig. 9 below for details, which are not repeated here.

And S130, outputting the ciphertext data.

It should be understood that the above fig. 1 is only an illustration, and does not constitute any limitation to the implementation method of the block cipher algorithm provided in the embodiment of the present application. For example, when the first packet cipher algorithm SM4 algorithm and the second packet cipher algorithm are AES algorithms, the key expansion algorithm may also be implemented according to the key expansion algorithm defined in the SM4 algorithm standard (for example, the formulas shown in formulas (3) to (5) above) and by using the first instruction and the second instruction in the instruction set according to the working principle of implementing the SM4 algorithm by using the first instruction and the second instruction in the instruction set shown in fig. 1 above.

In the embodiment of the application, the calculation of the round function in the first block cipher algorithm is realized by using the instruction set, the round function comprises the nonlinear transformation, and the instruction set comprises the instruction for solving the nonlinear transformation, so that the calculation efficiency for solving the round function can be improved. The number of instructions consumed when the first instruction and the second instruction in the instruction set are used for realizing the calculation of the round function is small, so that the operation efficiency can be improved, and the calculation efficiency can be further improved. The first instruction and the second instruction in the instruction set are instructions which can be supported by a general-purpose processor (for example, a CPU) to run, and the first instruction and the second instruction can support processing data with various types of lengths, so that the method has better universality and practical value. In addition, the SIMD technology can be used for reading data or matrix in parallel, so that the method has higher universality and practical value.

Fig. 6 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application. In particular, the round function includes a non-linear transformation. In fig. 6, the first block cipher algorithm is the SM4 algorithm, and the second block cipher algorithm is the AES algorithm, in this implementation, the nonlinear transformation may be understood as an S-box specified in the SM4 algorithm standard, and the algebraic expression of the S-box is specifically shown in the above equation (7). As shown in fig. 6, the flow includes S610 and S620. Next, S610 and S620 are described in detail.

S610, input data x to S-box by first instruction in GFNI instruction set, matrix A ₁ And matrix C ₁ And performing affine transformation to obtain an affine transformation result.

Wherein, the matrix A ₁ Is a matrix A _SM4 And the result of the dot multiplication of the isomorphic matrix, matrix C ₁ Is a predetermined vector C _SM4 And the result of the isomorphic matrix dot product.

In this application, input data x to S-boxes, matrix A ₁ Sum matrix C ₁ Performing affine transformation to obtain an affine transformation result, wherein the affine transformation result x' can be expressed by the following formula:

x'＝x·A ₁ +C ₁ (14)

a above ₁ Can be expressed by the following formula:

A ₁ ＝A _SM4 ·M (15)

c above ₁ Can be expressed by the following formula:

C ₁ ＝C _SM4 ·M (16)

s620, affine transformation result and matrix A are subjected to by utilizing a second instruction in the GFNI instruction set ₂ And a predetermined vector C _SM4 Performing inverse affine transformation to obtain output data of the S-box,

wherein, the matrix A ₂ Is the inverse of the isomorphic matrix and matrix A _SM4 The result of the dot product.

In the present application, the affine transformation result, matrix A, is processed ₂ And a predetermined vector C _SM4 Performing inverse affine transformation to obtain output data of an S-box, the output data y of the S-box being represented by the following formula:

y＝(x′) ^(-1) ·A ₂ +C _SM4 (17)

a above ₂ Can be expressed by the following formula:

A ₂ ＝M ^-1 ·A _SM4 (18)

optionally, before performing the above S610, the input data of the S-box in the SM4 finite field may also be mapped into the AES finite field using the first instruction. Specifically, the input data of the S-box in the SM4 finite field is dot-multiplied with the isomorphic matrix M, and the result obtained by the dot-multiplication is the result of mapping the input data of the S-box in the SM4 finite field to the AES finite field.

In the implementation mode, when the computation of the S box is realized based on the GFNI instruction set, only necessary algebraic operations are executed, the problem that additional operations are required to be introduced to offset invalid operations is avoided, and the consumption of system resources can be reduced. Specifically, the calculation flow of the S-box in the round function included in the SM4 algorithm can be implemented by using only three instructions (i.e., 2 first instructions and 1 second instruction), so that the number of instructions required is reduced to a great extent, and the operation efficiency can be improved. The method can process the common number of the plaintext packets (for example, 8 packets of plaintext data, 16 packets of plaintext data or 32 packets of plaintext data, etc.), that is, the calculation of the SM4 algorithm is not required to be executed after 128 or 256 packets of plaintext data are made together, so that the method has higher universality and practical value. The above method can be performed in a general purpose processor (which should be able to run the first and second instructions in the GFNI instruction set) without the need for dedicated hardware, making the method more versatile.

The calculation flow of the SM4 algorithm S-box provided in the embodiment of the present application is described in detail above, and a method for implementing the calculation flow of the SM4 algorithm S-box by using the GFNI instruction set is further described below based on the calculation flow of the S-box.

It should also be understood that the above S610 and S620 are described by taking the input data of the S box as x. Alternatively, the input data of the S-box may also be a data set including a plurality of x, and the plurality of x may be the same or different.

The calculation flow of the S-box of the SM4 algorithm provided in the embodiment of the present application is described in detail above with reference to S610 and S620, and a method for implementing the calculation flow of the S-box of the SM4 algorithm by using the GFNI instruction set is further described below based on the calculation flow of the S-box.

For illustration, the following provides a method for implementing the above S610 by using the VGF2P8AFFINEQB instruction in the GFNI instruction set and a method for implementing the above S620 by using the VGF2P8AFFINEINVQB, taking the assembler API whose operand is a 128-bit wide register as an example.

The method for implementing the S610 by using the VGF2P8AFFINEQB instruction includes: the input data of the S box is organized into 16 groups, the groups are stored in any one xmm register, and the matrix A is ₁ Stored in the form of a column vector into the lower 64 bits and the upper 64 bits of another xmm register. For example, when the register storing the incoming packet to the S-box is xmm0, it is stored to matrix A ₁ Is xmm1, the register for storing the output data is xmm2, and S610 implemented by the assembly interface of the VGF2P8AFFINEQB instruction in the GFNI instruction set is as follows:

VGF2P8AFFINEQB(xmm0,xmm1,xmm2,0ceh) (19)

the method for implementing the S620 by using VGF2P8AFFINEINVQB includes: the matrix A is divided into ₂ The upper 64 bits and lower 64 bits to the xmm register are stored in the form of column vectors. For example, store A ₂ The register of (3) is xmm3, the register for storing the output data is xmm4, and S620 implemented by the VGF2P8AFFINEINVQB instruction assembly interface in the GFNI instruction set is as follows:

VGF2P8AFFINEINVQB(xmm2,xmm3,xmm4,0d3h) (20)

finally, we present the results of the calculations of the above simplified S610 and 620 for a given input. For example, when the input data of the S-box operation in S610 is: 000102030405060708090A0B0C0D0E0F (16 8-bit inputs are grouped together and represented in 16-ary notation), the output result after the instruction operation shown in equation (19) above is: the output result of the CE09B47315D26FA816D16CABCD0AB770 after executing the instruction operation shown in the above formula (20) is: D690E9FECCE13DB716B614C228FB2C 05. As can be seen by referring to the SM4 standard, the output result after the instruction operation shown in the above formula (20) is performed is consistent with the output achieved by the S-box of the SM4 standard. Fig. 7 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application. Specifically, the round function includes a non-linear transformation and a linear transformation. Fig. 7 illustrates an example where the first block cipher algorithm is SM4 algorithm, and the second block cipher algorithm is AES algorithmIn one implementation, the non-linear transforms shown in fig. 7 correspond to the S-boxes specified in the SM4 algorithm standard, and the linear transforms correspond to the L-layers specified in the SM4 algorithm standard. The algebraic expression of the S-box is specifically shown in the above formula (7), and the algebraic expression of the L-layer is specifically shown in the above formula (11) or (12). For convenience of description, the input of the round function in the "SM 4 algorithm in the embodiment of the present application includes a round key rk _i And current internal state (x) of the SM4 algorithm _i ,x _i+1 ,x _i+2 ,x _i+3 ) The calculation flow of round functions in the SM4 encryption algorithm provided in the embodiments of the present application is described as an example, where i is 0, 1. As shown in fig. 7, the flow includes S710 to S750. Next, S710 to S750 are described in detail.

S710, in the ith iteration, the input data of S-box is mapped into AES finite field with the first instruction in the GFNI instruction set.

And in any round of iterative computation, the input data of the S box is determined according to the round key corresponding to the any round of iterative computation and the current internal state of the SM4 algorithm in any round of iterative computation. When the arbitrary round of iterative computation is the first round of iterative computation, the current internal state of the SM4 algorithm is the plaintext data to be encrypted that is input to the SM4 algorithm. For example, the input data of the S-box included in the round function can be calculated according to the round key of the SM4 algorithm and the current internal state of the SM4 algorithm as specified in the SM4 standard, which can be specifically referred to the above related description and will not be described in detail herein. For convenience of description, in the following of the embodiments of the present application, the input data of the S-box in the round function in the ith round of iterative computation is denoted as (x) _i ,x _i+1 ,x _i+2 ,x _i+3 ) I ═ 0,1,...., 31. The isomorphic matrix M is an isomorphic matrix for mapping data in the SM4 finite field to data in the AES finite field. Based on this, the input data (x) of the S-box can be expressed by the following formula _i ,x _i+1 ,x _i+2 ,x _i+3 ) Mapping into AES finite field:

wherein the content of the first and second substances,

representing the result obtained after mapping the input data of the S-box to the AES finite field.

For convenience of description, the following description of the embodiments of the present application will be given by taking i ═ 0 as an example. i-0 can be understood as the initial state of the SM4 algorithm when the SM4 algorithm includes round functions that have not yet performed an iteration. Optionally, i 1, 2.

S720, in the AES finite field, performing affine transformation for the first time on the input data of the S box, the matrix 1 and the matrix 2 by using a first instruction, and obtaining a first-time affine transformation result.

Wherein matrix 1 is M ^-1 ·A _SM4 M, matrix 2 is C _SM4 And M. Performing a first affine transformation on the input of the S-box, the matrix 1 and the matrix 2, obtaining a first affine transformation result

Can be expressed by the following formula:

in the above-mentioned formula (22),

when the matrix A is _SM4 As defined by the above formula (8), and the matrix M ^-1 And when matrix M is defined as above equation (6), the matrix

Sum matrix

Are defined as follows:

and S730, in the AES finite field, inverting the result of the first affine transformation by using a second instruction to obtain the result after inversion processing.

Wherein, the result after the inversion processing is carried out on the first affine transformation result

Can be expressed by the following formula:

and S740, in the AES finite field, performing affine transformation for the second time on the result after the inversion processing, the matrix 1 and the matrix 2 by using the first instruction, and obtaining a second-time affine transformation result.

Wherein the second affine transformation result

Can be expressed by the following formula:

it is understood that the above-mentioned S710 to S740 correspond to the calculation flow of the S-box in the round function included in the SM4 algorithm.

And S750, performing cyclic right shift on the second time affine transformation result by 8bits, 16 bits and 24 bits respectively in the AES finite domain to obtain a second time affine transformation result #2, a second time affine transformation result #3 and a second time affine transformation result #4 respectively.

Wherein, for the second affine transformation result

Performing cyclic right shift by 8bits, 16 bits and 24 bits respectively, and obtaining the results after shift as follows:

and

for example, the above-described circular shift operation may be implemented using a loop right shift vporld instruction. At this time, the operation of S750 described above may be represented by the following instruction:

vprold(xmm5,xmm6,08h) (27)

vprold(xmm5,xmm7,010h) (28)

vprold(xmm5,xmm8,018h) (29)

wherein, in the above formulas (27) to (29), the xmm5 register is used for storing

xmm6 register for storage

xmm7 register for storage

xmm8 register for storing

S760, in the AES finite field, the data #1, the data #2, the data #3, and the data #4 are summed to obtain an output result of the linear transformation.

Wherein, the data #1 is obtained by performing linear transformation #1 on the second affine transformation result #1 by using the first instruction; data #2 is obtained by performing linear transformation #2 on the second affine transformation result #2 with the first instruction; data #3 is obtained by performing linear transformation #3 on the second affine transformation result #3 with the first instruction; data #4 is obtained by performing linear transformation #4 on the second affine transformation result #4 by the first instruction. The linear transformation #1, the linear transformation #2, the linear transformation #3, and the linear transformation #4 are respectively associated with the matrix L included in the preset matrix set ₀ 、L ₁ 、L ₂ And L ₃ And (4) corresponding to each other. Based on this, the output result of the linear transformation

Can be expressed by the following formula:

wherein the content of the first and second substances,

it indicates that the data of the number #1,

it indicates that the data of the number #2,

it indicates that the data of the number #3,

indicating data # 4.

The above-described method of obtaining the data #1 to the data #4 is a method corresponding to a linear transformation corresponding to the L layer defined in the SM4 algorithm standard. It is to be understood that the execution order of the above S750 and the data #1 to the data #4 in the above S760 may also be interchanged, that is, after the above S730, the loop right shift operation is performed on the result obtained by performing the S730, and then the affine transformation is performed on the result obtained by performing the loop right shift operation and the result obtained by performing the S730.

It can be understood that the current iteration is the ith iteration calculation of the round function in the SM4 algorithm, and the above S720 is performed in the ith iteration in the above implementation. Optionally, in other implementations, S720 may also be performed at the previous iteration calculation before the ith iteration. For example, S720 may be performed after obtaining the result of the linear transformation is performed in the previous iteration calculation before the ith iteration.

In the implementation manner, when the calculation of the round function (including the calculation of the S-box and the calculation of the L layer) included in the SM4 algorithm is implemented based on the GFNI instruction set, only necessary algebraic operations are executed, so that the problem that additional operations are required to be introduced to counteract invalid operations is avoided, and the consumption of system resources can be reduced. Specifically, the calculation flow of the round function included in the SM4 algorithm can be implemented by only five instructions (i.e., 4 first instructions and 1 second instruction), so that the number of instructions required is reduced to a great extent, and the operation efficiency can be improved. The method can process the common number of the plaintext packets (for example, 8 packets of plaintext data, 16 packets of plaintext data or 32 packets of plaintext data, etc.), that is, the calculation of the SM4 algorithm is not required to be executed after 128 or 256 packets of plaintext data are packed together, so that the method has higher universality and practical value. The above method can be performed in a general purpose processor (which should be able to run the first and second instructions in the GFNI instruction set) without the need for dedicated hardware, making the method more versatile.

Fig. 8 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application. In particular, the round function includes a non-linear transformation. In fig. 8, the first block cipher algorithm is SM4 algorithm, and the second block cipher algorithm is AES algorithm, in this implementation, the nonlinear transformation corresponds to S-boxes in the round function included in SM4 algorithm, and the algebraic expression of S-boxes is specifically shown in formula (7) above. As shown in fig. 8, the flow includes S810 to S850. Next, S810 to S850 are described in detail.

S810, convert the input data of the S-box to data on the AES finite field with the first instruction in the GFNI instruction set.

For convenience of description, the input data of the S-box is denoted as x in the present embodiment, where x is binary data of 8 bits; mapping data in the SM4 finite field to an isomorphic matrix of the data in the AES finite field, and recording the isomorphic matrix as M; the isomorphic matrix that maps data in the AES finite field to data in the SM4 finite field is denoted as M ^-1 . Based on this, when the input data of S-box is converted into x, x is mappedThe operation to the AES finite field may be expressed by the following formula:

x ₁ ＝x·M (31)

wherein x is ₁ Represents mapping x in the SM4 finite field to data in the AES finite field; matrix M and matrix M ^-1 See equation (6) above for definition of (a).

S820, in AES finite field, utilizing the first instruction to perform the data obtained after the step S810 and the matrix A _SM4 ' sum matrix C _SM4 ' performing a first affine transformation to obtain a first affine transformation result.

Matrix A _SM4 ' is based on a homogeneous matrix, the inverse of the homogeneous matrix and a matrix A _SM4 And (4) determining. In particular, matrix A _SM4 ' is to the isomorphic matrix, the inverse of the isomorphic matrix and matrix A _SM4 The result of performing a dot multiplication, array A _SM4 The definition of' can be referred to above in equation (9).

Wherein M is ^-1 Denotes the inverse of the isomorphic matrix and M denotes the isomorphic matrix.

Matrix C _SM4 Is based on isomorphic matrix and matrix A _SM4 And (4) determining. In particular, matrix C _SM4 Is to isomorphic matrix and matrix C _SM4 The result of performing the dot multiplication, matrix C _SM4 The definition of' can be seen in equation (10) above.

In AES finite field, the matrix A is added to the data obtained after the execution of the above S810 _SM4 ' sum matrix C _SM4 ' performing a first affine transformation to obtain a first affine transformation result, the first affine transformation result x ₂ Can be expressed by the following formula:

x ₂ ＝x ₁ ·A _SM4 '+C _SM4 ' (32)

and S830, in the AES finite field, utilizing the second instruction to invert the first affine transformation result to obtain an inverted result.

The first affine transformation result obtained in the above S820 is x ₂ To x ₂ Inverting to obtain the result of inversion, so that the result x of inversion ₃ Can be expressed by the following formula:

x ₃ ＝(x ₂ ) ^-1 (33)

s840, in AES finite field, utilizing the first instruction to invert the result, the matrix A _SM4 ' sum matrix C _SM4 ' performing a second affine transformation to obtain a second affine transformation result.

From the results obtained in the above-mentioned S810 to S830, in the AES finite field, the result of the inversion, the matrix A, is applied _SM4 ' sum matrix C _SM4 ' performing a second affine transformation to obtain a second affine transformation result, the second affine transformation result x ₄ Can be expressed by the following formula:

x ₄ ＝x ₃ ·A _SM4 '+C _SM4 ' (34)

and S850, converting the second affine transformation result into data on the SM4 finite field by using the first instruction, and taking the data as output data of the S box.

According to the result obtained in S840 above, the second affine transformation result is converted into data on the SM4 finite field, and the data is used as the output data of the S-box, and the mapped second affine transformation result y can be represented by the following formula:

y＝(x ₄ ) ^-1 (35)

in the implementation manner, five instructions (i.e., 4 first instructions and 1 second instruction) can be used to implement the calculation flow of the S-box in the round function included in the SM4 algorithm, and the method does not need to consume a large amount of system resources and has high operation efficiency. The method can be executed in a general-purpose processor without special hardware, so that the method has high universality.

In the implementation manner, when the computation of the S-box in the round function included in the SM4 algorithm is implemented based on the GFNI instruction set, only necessary algebraic operations are executed, so that the problem that additional operations are required to be introduced to offset invalid operations is avoided, and the consumption of system resources can be reduced. Specifically, the calculation flow of the S-box can be realized by only five instructions (i.e., 4 first instructions and 1 second instruction), so that the number of required instructions is reduced to a great extent, and the operation efficiency can be improved. The method can process the common number of the plaintext packets (for example, 8 packets of plaintext data, 16 packets of plaintext data or 32 packets of plaintext data, etc.), that is, the calculation of the SM4 algorithm is not required to be executed after 128 or 256 packets of plaintext data are made together, so that the method has higher universality and practical value. The above method can be performed in a general purpose processor (which should be able to run the first and second instructions in the GFNI instruction set) without the need for dedicated hardware, making the method more versatile.

Fig. 9 is a schematic diagram of a method for implementing round functions in a block cipher algorithm according to an embodiment of the present application. Specifically, the round function includes a non-linear transformation, and the non-linearity includes an affine transformation and an inverse affine transformation. Fig. 9 is described by taking as an example that the first packet cipher algorithm is the SM4 algorithm and the second packet cipher algorithm is the AES algorithm, and in this implementation, the non-linear transformation shown in fig. 9 corresponds to the S-box and L-layer specified in the SM4 algorithm standard. The algebraic expression of the S-box is specifically shown in formula (7), and the algebraic expression of the L-layer is specifically shown in formula (11) or formula (12). For convenience of description, in the embodiment of the present application, the input data of the nonlinear transformation of the SM4 algorithm is (x) when "the ith round of iterative computation is performed _i ,x _i+1 ,x _i+2 ,x _i+3 ) The calculation flow of round functions in the SM4 encryption algorithm provided in the embodiments of the present application is described as an example. As shown in fig. 9, the flow includes S910 to S940. Next, S910 to S940 are described in detail.

And S910, in the ith iteration, respectively mapping the input data of the nonlinear transformation into AES finite fields by using a first instruction in the GFNI instruction set.

For convenience of description, in the embodiment of the present application, the input data of the SM4 algorithm including the nonlinear transformation in the round function in the ith round of iteration is denoted as (x) _i ,x _i+1 ,x _i+2 ,x _i+3 ),i＝0,1,......,31。

It is understood that the operation principle of S910 is the same as that of S710, and only the input data of the S box in S720 is replaced by the input data of the nonlinear transformation in the embodiment of the present application, and details that are not described herein in detail may be referred to the description in S710.

S920, in the AES finite field, affine transformation is performed on the input data of the nonlinear transformation, the matrix 1, and the matrix 2 with the first instruction, obtaining an affine transformation result.

It is understood that the operation principle of S920 is the same as that of S720, and only the input data of the S box in S720 is replaced by the input data of the nonlinear transformation in the embodiment of the present application, and details that are not described herein in detail may be referred to the description in S720.

And S930, in the AES finite field, performing a circular shift operation on the affine transformation to obtain a circular shift operation result.

In the AES finite field, performing a cyclic shift operation on the affine transformation result to obtain a cyclic shift operation result, including: and performing cyclic right shift on the affine transformation result by 0 bit, 8bit, 16 bit and 24 bit to obtain corresponding 3 cyclic shift operation results. Illustratively, when the affine transformation result obtained in the above-described S930 is

In time, obtaining the corresponding 4 cyclic shift operation results can be written as:

s940, in the AES finite field, performing inverse affine transformation on the result of the cyclic shift operation, the data 1, and the data 2 with the second instruction, and obtaining a result of the nonlinear transformation.

Wherein the data 1 is data mapping a dot product result of a preset matrix set and a preset matrix to the second finite field. For example, when the preset matrix set includes 4 preset matrices (i.e., L) shown in the above formula (13) ₀ ,、L ₁ 、L ₂ And L ₃ ) Data 1 may comprise a matrix

Matrix array

Sum matrix

Matrix of

Matrix array

Sum matrix

Are defined as follows:

wherein the data 2 is data mapping the preset matrix set and the preset vector into the second finite field. Optionally, the data 2 is data mapping the preset matrix set and the preset vector into the second finite field, that is, the data 2 is data mapping part of the preset matrix and the preset vector in the preset matrix set into the second finite field. For example, when the preset matrix set includes 4 preset matrices (i.e., L) shown in the above formula (13) ₀ ,、L ₁ 、L ₂ And L ₃ ) When, data 2 can be represented by the following formula:

C _SM4 ·(L ₀ +L ₃ )·M (40)

based on this, the result of the non-linear transformation

Can be expressed by the following formula:

in the implementation manner, when the calculation of the round function included in the SM4 algorithm is implemented based on the GFNI instruction set, only necessary algebraic operations are executed, so that the problem that additional operations are required to counteract invalid operations is avoided, and the consumption of system resources can be reduced. Specifically, the calculation flow of the round function included in the SM4 algorithm can be implemented by using only 3 instructions (i.e., 2 first instructions and 1 second instruction), so that the number of instructions required is reduced to a great extent, and the operating efficiency can be improved. The method can process the common number of the plaintext packets (for example, 8 packets of plaintext data, 16 packets of plaintext data or 32 packets of plaintext data, etc.), that is, the calculation of the SM4 algorithm is not required to be executed after 128 or 256 packets of plaintext data are packed together, so that the method has higher universality and practical value. The above method can be performed in a general purpose processor (which should be able to run the first and second instructions in the GFNI instruction set) without the need for dedicated hardware, making the method more versatile.

The schematic diagram of the round function implementation in the block cipher algorithm provided in the embodiment of the present application is described in detail above with reference to fig. 9, and the method described in fig. 9 is implemented by using the GFNI instruction set.

For illustrative purposes, the following description will be given of a method for implementing the above simplified round function calculation flow using the GFNI instruction set, taking an assembler API whose operands are 128-bit wide registers as an example. Where the 128-bit wide register is capable of handling 4 packets of the SM4 algorithm simultaneously.

First, round key rk is assigned _i And current internal state (x) of the SM4 algorithm _i ,x _i+1 ,x _i+2 ,x _i+3 ) I-0, 1........ 31 is loaded into a 128-bit register (for example, the register xmm 0), and the matrix M is loaded into the upper 64 bits and the lower 64 bits of another register (the loaded register is xmm1) in the form of column vectors, and the column vectors correspond to the upper 64 bits and the lower 64 bits of the other register, respectivelyIs xmm2, performs the following operations:

VGF2P8AFFINEQB(xmm0,xmm1,xmm2,00h) (42)

then, the inputs of the four grouped S-boxes (each group of inputs contains 32 bits, 4 × 8) are computed according to the structure of the SM4 algorithm standard round functions, and the four inputs are placed in turn into a 128-bit register (let this register be xmm 3).

Then, the matrix is divided into

Another 128-bit register (set to xmm4) is loaded in the form of a column vector, setting the output register to xmm5, performing the following instruction operations:

VGF2P8AFFINEQB(xmm3,xmm4,xmm5,0ceh) (43)

then, a loop right shift operation is performed first for the result output by register xmm 5. Performing a loop right shift operation on the result output by register xmm5 may include: the cyclic right shift 8, 16 and 24 bit operations are performed on the result output by xmm5, respectively. For example, the loop right shift operation may be implemented with the following instructions:

vprold(xmm5,xmm6,08h) (44)

vprold(xmm5,xmm7,010h) (45)

vprold(xmm5,xmm8,018h) (46)

wherein the vpold instruction is for a shift right instruction in 32-bit words. Alternatively, the vpold instruction operation may be implemented by using other instructions, which is not particularly limited.

Then, the matrix is divided into two parts

And

put into the upper 64 bits and the lower 64 bits of the register respectively, set

The register that is put in is xmm9,

the register that is put in is xmm10,

the register is put in xmm11, and the following instruction operations are executed:

VGF2P8AFFINEINVQB(xmm5,xmm9,xmm5,00h) (47)

VGF2P8AFFINEINVQB(xmm6,xmm10,xmm6,00h) (48)

VGF2P8AFFINEINVQB(xmm7,xmm10,xmm7,00h) (49)

VGF2P8AFFINEINVQB(xmm8,xmm11,xmm8,0e7h) (50)

finally, the contents of registers xmm5, xmm6, xmm7, and xmm8 are xored to obtain the output of this substep, i.e., the output of the L-layer operation.

It should be noted that, in the above steps, the example of executing the loop shift right operation first and executing the VGF2P8AFFINEINVQB instruction according to the result obtained by the loop shift right operation is described. Optionally, the above steps may be replaced by the following steps: the loop shift right operation is performed on the result obtained by executing the VGF2P8AFFINEINVQB instruction operation, and then on the VGF2P8AFFINEINVQB instruction operation.

After 32 rounds of iterative computation are performed, the 32 th round of iterative computation result is loaded into 4 128-bit registers, and the matrix M is processed ^-1 The upper 64 bits and the lower 64 bits of another 128-bit register (i.e., the xmm13 register) are loaded in the form of a column vector. When the result of any round of iterative computation is loaded into the xmm11 memory and the output ciphertext data is loaded into the xmm12 memory, the following instructions are executed to obtain ciphertext data:

VGF2P8AFFINEQB(xmm11,xmm12,xmm13,00h) (51)

after the instruction operation processing of the above formula (52) is used, the obtained results after the processing are subjected to reverse order arrangement, and ciphertext data, namely the encrypted result output of the SM4 algorithm, is obtained.

Fig. 10A is a schematic diagram of a method for implementing an SM4 encryption algorithm according to an embodiment of the present application. Specifically, the round function of the SM4 algorithm includes an S-box and an L-layer. Wherein, the generation of S boxThe numerical expression is specifically shown in the above formula (7), and the algebraic expression of the L layer is specifically shown in the above formula (11) or the above formula (12). For convenience of description, in the embodiment of the present application, when "the ith round of iterative computation is performed, the SM4 encryption algorithm includes the following inputs of round functions: round key rk _i And current internal state (x) of the SM4 algorithm _i ,x _i+1 ,x _i+2 ,x _i+3 ) The method implemented by the SM4 encryption algorithm provided in the embodiments of the present application is described as an example of i-0, 1. As shown in fig. 10A, the flow includes S1001 to S1009. Next, S1001 to S1009 are described in detail.

S1001, obtain 128-bit plaintext data to be encrypted, and determine input data of an S-box when an i-th round of iterative computation is performed by a round function included in the SM4 encryption algorithm according to the 128-bit plaintext data to be encrypted and a round key, where i is 0,1, 2.

The 128-bit plaintext data to be encrypted can be understood as the internal state of the SM4 algorithm when the 1 st iteration is performed. That is, the input data of the S-box in the ith round of iterative computation is determined according to the 128-bit plaintext data to be encrypted and the round key, where the SM4 encryption algorithm includes the round function, and it can be understood that the input data of the S-box in the ith round of iterative computation is obtained according to the round key corresponding to the ith round of iterative computation and the internal state of the SM4 algorithm in the ith round of iterative computation. In one example, the input data of the S-box at the i-th round of iterative computation may be obtained from the corresponding round key at the i-th round of iterative computation and the internal state of the SM4 algorithm at the i-th round of iterative computation in a manner specified in the SM4 algorithm standard. For convenience of description, the 128-bit plaintext data input to the SM4 algorithm is hereinafter referred to as (X) ₀ ,X ₁ ,X ₂ ,X ₃ )，(X ₀ ,X ₁ ,X ₂ ,X ₃ ) Also known as the internal state of the SM4 algorithm at round 1 iteration. Taking the 1 st round of iterative computation as an example, the round key rk calculated by the 1 st round of iterative computation is utilized ₀ To X ₁ ，X ₂ And X ₃ XOR to obtain 32-bit data, and the 32-bit data is split into 4 data with 8bits, i.e. the input data (x) of the S box in the ith round of iterative computation ₀ ,x ₁ ,x ₂ ,x ₃ )。Wherein a round key rk is utilized ₀ To X ₁ ，X ₂ And X ₃ The XOR can be expressed as

Taking the 2 nd round iterative computation as an example, the round key rk calculated by the 2 nd round iterative computation is utilized ₁ To X ₂ 、X ₃ And X ₄ XOR to obtain 32-bit data, and the 32-bit data is split into 4 data with 8bits, i.e. the input data (x) of the S box in the 2 nd round of iterative computation ₁ ,x ₂ ,x ₃ ,x ₄ )，X ₄ The results obtained for the 1 st iteration of the calculation. By analogy, the input data of the S-box in round 3, … …, 32 iteration is obtained. Round key rk _i Is determined based on the encryption key, and the manner of obtaining the round key is not particularly limited. For example, the round key may be obtained from the encryption key in the manner specified in the SM4 algorithm standard.

It is understood that the input data of the S-box in the above-described S1001 is data in the SM4 finite field. When computing the SM4 encryption algorithm, 32 iterative computations need to be performed on the round functions included in the SM4 encryption algorithm. The principle of any one iteration calculation in the 32 iterations is the same, and only the input data and the output result of the round function are different in any one iteration calculation. Next, the calculation flow of the round function in the ith round of iteration is described in detail with reference to S1002 to S1006.

And S1002, performing affine transformation on the input data of the S box, the matrix 1 and the matrix 2 by using the instruction 1 in the AES finite field to obtain an affine transformation result.

Wherein, instruction 1 is the VGF2P8AFFINeQB instruction. The matrix 1 is denoted A ₁ ，A ₁ See equation (15) above for a definition of (c). The matrix 2 is denoted by C ₁ ，C ₁ See equation (16) above for a definition of (c).

In the embodiment of the present application, the input data of the S-box in the ith round of iterative computation is recorded as (x) _i ,x _i+1 ,x _i+2 ,x _i+3 ) I is 0,1, a. Based on this, affine transformation is performed on the input data of the S-box, matrix 1 and matrix 2 using instruction 1Alternatively, an affine transformation result is obtained, i.e. the calculation of the following formula is implemented using instruction 1:

wherein, the first and the second end of the pipe are connected with each other,

representing the affine transformation result.

S1003, in AES finite field, using instruction 2 to affine transformation result and matrix A ₂ And a predetermined vector C _SM4 And performing inverse affine transformation to obtain an output result of the S box.

Wherein the vector C is preset _SM4 See the above formula (8), matrix A ₂ See equation (18) above for a definition of (c).

Wherein (x) _i ,x _i+1 ,x _i+2 ,x _i+3 ) ' denotes the output result of the S-box.

And S1004, in the AES finite field, performing a cyclic shift operation on the output result of the S box to obtain a cyclic shift result.

In the AES finite field, performing a circular shift operation on an output result of the S-box to obtain a circular shift operation result, including: performing cyclic right shift of 0 bit, 8bit, 16 bit and 24 bit on the output result of the S box to obtain corresponding 4 cyclic shift operation results, and when the output result of the S box is recorded as (x) _i ,x _i+1 ,x _i+2 ,x _i+3 ) In this case, the results of the 4 cyclic shift operations can be respectively expressed as: (x) _i ,x _i+1 ,x _i+2 ,x _i+3 )'、(x _i+3 ,x _i ,x _i+1 ,x _i+2 )'、(x _i+2 ,x _i+3 ,x _i ,x _i+1 ) ' and (x) _i+1 ,x _i+2 ,x _i+3 ,x _i )'。

S1005, in AES finite field, performing inverse affine transformation on the result of the circular shift operation, the data 1 and the data 2 by using the instruction 2, and obtaining an output result of L layer.

Wherein, the data 1 is data mapping the dot product of the preset matrix set and the preset matrix to the second finite field. When the preset matrix set includes 4 preset matrices (i.e., L) shown in the above formula (13) ₀ ,、L ₁ 、L ₂ And L ₃ ) Data 1 may include the following matrix: matrix array

Matrix array

Sum matrix

The definition of these matrices can be seen in equations (37) to (39) above. Data 2 is data that maps the set of predetermined matrices and the predetermined vector into a second finite field. Optionally, the data 2 is data mapping the preset matrix set and the preset vector into the second finite field, that is, the data 2 is data mapping part of the preset matrix and the preset vector in the preset matrix set into the second finite field. The definition of data 2 can be seen in equation (40) above.

In the AES finite field, performing inverse affine transformation on the result of the circular shift operation, data 1 and data 2 by using instruction 2 to obtain an output result of the L layer, that is, implementing calculation of the following formula by using instruction 2:

result of non-linear transformation

Can be expressed by the following formula:

the instruction 2 may be the VGF2P8AFFINEINVQB instruction in the GFNI instruction set.

And S1006, performing exclusive OR operation on the result of the nonlinear transformation and the input data of the S box to obtain the ith round of iterative computation result.

Performing exclusive or operation on the result of the nonlinear transformation and the input data of the S box to obtain the ith round of iterative computation result, namely performing the following operation:

it can be understood that after the ith round of iterative computation is performed, the internal state of the SM4 algorithm is from (x) _i ,x _i+1 ,x _i+2 ,x _i+3 ) Is changed to (x) _i+1 ,x _i+2 ,x _i+3 ,x _i+4 ). That is, when the i +1 th iteration is performed, the internal state of the SM4 algorithm is (x) _i+1 ,x _i+2 ,x _i+3 ,x _i+4 )。

S1007, it is determined whether i is equal to 31.

As can be seen from the working principle of the SM4 algorithm, when i is equal to 31, it is the last round of iterative computation (i.e., 32 th round of iterative computation) performed on the round function included in the SM4 algorithm. That is, if the current iterative computation is the last iterative computation performed on the round function, the iterative computation needs to be stopped thereafter, and ciphertext data is obtained according to the result of the last iterative computation. If the current iteration calculation is not the last iteration calculation performed on the round function, then the next iteration calculation needs to be performed continuously thereafter. Based thereon, determining whether i equals 31 includes: if it is determined that i is equal to 31, S1008 is performed after S1007; if it is determined that i is not equal to 31, S1009 is performed after S1007.

And S1008, respectively mapping results obtained by iterative computations of the (i-3) th round, the (i-2) th round, the (i-1) th round and the (i) th round to SM4 finite fields by using the instruction 1, and performing reverse order arrangement on the mapping results to obtain ciphertext data.

Mapping the results obtained by the iterative calculations of the i-3 rd round, the i-2 nd round, the i-1 st round and the ith round to the SM4 finite field by using an instruction 1, namely, respectively realizing the calculation of the following formula by using the instruction 1:

wherein, X ₃₂ '、X ₃₃ '、X ₃₄ ' and X ₃₅ ' are the results obtained by iterative calculations of the i-3 rd round, the i-2 nd round, the i-1 st round and the i-th round, respectively.

When the mapping results of the results obtained by the iterative computations of the i-3 th round, the i-2 th round, the i-1 st round and the i-th round are respectively: x ₃₂ ，X ₃₃ ，X ₃₄ And X ₃₅ When it is used, the result of the reverse arrangement is X ₃₅ ，X ₃₄ ，X ₃₃ And X ₃₂ I.e. the ciphertext data is (X) ₃₅ ,X ₃₄ ,X ₃₃ ,X ₃₂ )。

S1009, the calculation of the i +1 th iteration round function is performed.

The calculation principle of the (i + 1) th iteration round function is the same as that of the (i) th iteration round function, except that the input data and the output result of the round function are different. And when the (i + 1) th round of iterative computation, the input data of the round function is the ith round of iterative computation result. The input data of the round function may be understood as the input data of the S-box in the round function. For details of the method corresponding to S1008, refer to the calculation flow of the ith iteration round function in S1002 to S1006.

The apparatus that executes the methods of S1001 to S1009 described above may be a general-purpose processor that supports the GFNI instruction set, for example, the processor may be a CPU.

It is understood that the above implementation is described by taking the SM4 encryption algorithm as an example. The SM4 algorithm includes an encryption algorithm that operates on the same principle as a decryption algorithm except that the order of the round keys used by the decryption algorithm and the order of the round keys used by the encryption algorithm are in reverse order. When the SM4 encryption algorithm is implemented in the above method, the round key used in 32 rounds of iterative computation of the round function is: (rk) ₀ ,rk ₁ ,......,rk ₃₁ ). So the round key (rk) in the above method is used ₀ ,rk ₁ ,......,rk ₃₁ ) Replacement by (rk) ₃₁ ,rk ₃₀ ,……,rk ₀ ) I.e. the SM4 decryption algorithm can be implemented.

In the above implementation, only the instructions in the 3 GFNI instruction sets (i.e., 1 VGF2P8AFFINEQB instruction and 2 VGF2P8AFFINEINVQB instructions) need to be utilized in each iteration of computing the round function. After the last 4 rounds of iterative computation results are obtained, 1 instruction is used for mapping the 4 rounds of iterative computation results to the SM4 finite field to obtain ciphertext data. The method only needs to consume a small number of instructions to realize the SM4 encryption algorithm, and can improve the operation efficiency of the SM4 encryption algorithm. The method can process the common number of the plaintext packets (for example, 8 packets of plaintext data, 16 packets of plaintext data or 32 packets of plaintext data, etc.), that is, the calculation of the SM4 algorithm is not required to be executed after 128 or 256 packets of plaintext data are made together, so that the method has higher universality and practical value. When the method is executed by a general-purpose processor, the processor does not need to arrange hardware special for solving the SM4 algorithm, so that the method has better universality.

It should be understood that the specific embodiments shown in fig. 6 to 10A are only illustrative and do not limit the method for implementing the block cipher algorithm provided in the present application.

The method for implementing the block cipher algorithm provided in the present application is described in detail above with reference to fig. 1 to 10A. Next, an apparatus and an electronic device for implementing the block cipher algorithm provided in the present application are described with reference to fig. 10 and fig. 11. It should be understood that the above block cipher algorithm implemented method corresponds to the following block cipher algorithm implemented apparatus and electronic device. Therefore, the content which is not described in detail below can be referred to the relevant description in the above method embodiments.

The embodiment of the present application provides a device for implementing a block cipher algorithm, which corresponds to the method for implementing a block cipher algorithm provided by the embodiment of the present application.

Fig. 10 is a block diagram of an apparatus for implementing a block cipher algorithm according to an embodiment of the present application. As shown in fig. 10, the apparatus includes an acquisition unit 1001, a processing unit 1002 and an output unit 1003,

in some implementations, the apparatus is configured to implement an encryption algorithm flow corresponding to the block cipher algorithm in the foregoing method embodiment. Next, steps in the encryption algorithm executed when the acquisition unit 1001, the processing unit 1002, and the output unit 1003 execute the encryption algorithm are described.

An acquisition unit 1001 configured to acquire plaintext data to be encrypted; a processing unit 1002, configured to perform encryption processing on plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, where calculation of a round function in the first block cipher algorithm is implemented by using an instruction set, where the round function includes nonlinear transformation, and the instruction set includes an instruction for solving the nonlinear transformation; an output unit 1003, configured to output the ciphertext data.

Optionally, the processing unit 1002 is further configured to: obtaining input data of the nonlinear transformation, wherein the input data of the nonlinear transformation is determined according to a round key and the plaintext data, the input data of the nonlinear transformation is data in a first finite field, and the first finite field is a finite field of the first cipher block algorithm; in a second finite field, implementing the nonlinear transformation on the input data of the nonlinear transformation by using the instruction set to obtain a result of the nonlinear transformation, wherein the result of the nonlinear transformation is data in the second finite field, the second finite field is a finite field of a second grouping cryptographic algorithm, the second finite field has an isomorphic relationship with the first finite field, and the first grouping cryptographic algorithm is different from the second grouping cryptographic algorithm; obtaining the ciphertext data according to a result of the nonlinear transformation, the ciphertext data being data in the first finite field.

Optionally, the non-linear transformation includes a first affine transformation and a first inverse affine transformation, and the processing unit 1002 is further configured to: performing the first affine transformation on first data, second data and the input data of the non-linear transformation by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the first data is data for mapping a preset matrix into the second finite field according to an isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; and executing the first inverse affine transformation on the preset matrix, the preset vector and the first affine transformation result by using a second instruction in the instruction set to obtain a result of the nonlinear transformation.

Optionally, the non-linear transformation includes a first affine transformation, a second affine transformation, and an inverse transformation, and the processing unit 1002 is further configured to: performing a first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the mapping result of the input data is data that maps the input data into the second finite field according to an isomorphic matrix, the first data is data that maps a preset matrix into the second finite field according to an isomorphic matrix, the second data is data that maps a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; performing the inverse transformation on the first affine transformation result by using a second instruction in the instruction set to obtain an inverse transformation result; performing a second affine transformation on the first data, the second data, and the result of the inverse transformation using the first instruction, obtaining a result of the non-linear transformation.

Optionally, the round function further includes a linear transformation, and the processing unit 1002 is further configured to: performing the linear transformation on the result of the nonlinear transformation to obtain a result of the linear transformation; performing an exclusive-or operation on the result of the linear transformation and the internal state of the first block cipher algorithm to obtain an output result of the round function; and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of round function output results obtained by the last four rounds of iterative computation in the first block cipher algorithm, and determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the round function output results obtained by any round of iterative computation are results obtained by mapping the round function output results obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

Optionally, the processing unit 1002 is further configured to: performing a cyclic shift operation on the result of the nonlinear transformation to obtain a cyclic shift operation result; and executing the linear transformation on the cyclic shift operation result according to a linear preset matrix set by using the first instruction to obtain a linear transformation result, wherein the number of preset matrixes included in the preset matrix set is related to the cyclic shift operation result.

Optionally, the non-linear transformation includes a first affine transformation and a first inverse affine transformation, and the processing unit 1002 is further configured to: performing the first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the mapping result of the input data is data that maps the input data into the second finite field according to an isomorphic matrix, the first data is data that maps a preset matrix into the second finite field according to an isomorphic matrix, and the second data is data that maps a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship; performing a cyclic shift operation on the first affine transformation result to obtain a cyclic shift operation result; performing the first inverse affine transformation on the cyclic shift operation result, third data and fourth data by using a second instruction in the instruction set, and obtaining a result of the nonlinear transformation, wherein the third data is data mapping a preset matrix set and a dot product result of the preset matrix into the second finite field, and the fourth data is data mapping the preset matrix set and the preset vector into the second finite field.

Optionally, the processing unit 1002 is further configured to: performing an exclusive or operation on a result of the nonlinear transformation and an internal state of the first block cipher algorithm to obtain an output result of the round function, the internal state of the first block cipher algorithm being associated with input data of the nonlinear transformation; and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of output results of round functions obtained by the last round of iterative computation in the first block cipher algorithm, determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the output results of the round functions obtained by any round of iterative computation are results obtained by mapping the output results of the round functions obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

Optionally, the processing unit 1002 is further configured to: and mapping the input data of the nonlinear transformation to the second finite field by using a first instruction in the instruction set and the isomorphic matrix to obtain a mapping result of the input data.

Optionally, in another implementation manner, the apparatus is configured to implement a decryption algorithm flow corresponding to the block cipher algorithm in the foregoing method embodiment. Next, steps in the decryption algorithm executed when the acquisition unit 1001, the processing unit 1002, and the output unit 1003 execute the decryption algorithm are described.

An obtaining unit 1001 configured to obtain ciphertext data to be decrypted;

a processing unit 1002, configured to perform decryption processing on the decrypted data to be decrypted by using a first block cipher algorithm to obtain plaintext data, where calculation of a round function in the first block cipher algorithm is implemented by using an instruction set, where the round function includes a nonlinear transformation, and the instruction set includes an instruction for solving the nonlinear transformation;

an output unit 1003, configured to output the plaintext data.

It is understood that the implementation principle of the decryption algorithm provided by the embodiment of the present application is the same as the implementation principle of the encryption algorithm, except that the order of the round keys used by the decryption algorithm and the order of the round keys used by the encryption algorithm are in reverse order, so that the contents not described in detail in this section can be referred to the implementation flow of the encryption algorithm above.

It should be noted that, for the detailed description of the apparatus embodiment provided in the embodiment of the present application, reference may be made to related descriptions in a method for implementing a block cipher algorithm provided in the embodiment of the present application, and details are not repeated here.

Corresponding to the method for implementing the block cipher algorithm provided by the embodiment of the application, the embodiment of the application provides an electronic device.

Fig. 11 is a structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 11, includes a memory 1101, a processor 1102, a communication interface 1103, and a communication bus 1104. The memory 1101, the processor 1102 and the communication interface 1103 are communicatively connected to each other through a communication bus 1104.

The memory 1101 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 1101 may store programs, and when the programs stored in the memory 1101 are executed by the processor 1102, the processor 1102 and the communication interface 1103 are used for executing the steps of the method implemented by the block cipher algorithm of the embodiment of the present application.

The processor 1102 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement functions that need to be executed by units in the apparatus implemented by the block cipher algorithm according to the embodiment of the present application, or to execute the steps of the method implemented by the block cipher algorithm according to the embodiment of the present application.

The processor 1102 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method implemented by the block cipher algorithm provided in the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 1102. The processor 1102 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1101, and the processor 1102 reads the information in the memory 1101, and performs the functions required to be performed by the units included in the apparatus implemented by using the block cipher algorithm according to the embodiment of the present application or performs the method implemented by using the block cipher algorithm according to the embodiment of the present application in combination with the hardware of the processor.

The communication interface 1103 implements communication between the device shown in fig. 11 and other devices or communication networks using transceiver means, such as, but not limited to, a transceiver. For example, ciphertext data or the like may be output via the communication interface 1103.

A communication bus 1104 may include a path that conveys information between various components of the device shown in fig. 11 (e.g., memory 1101, processor 1102, communication interface 1103).

An embodiment of the present application further provides a storage device, where the storage device stores program instructions executable by a processor, and the program instructions are used to implement the steps of the method implemented by the block cipher algorithm provided in the present application.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored on a computer-readable medium and include several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device) to execute the method according to the embodiments of the present disclosure.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage media, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims

1. A method for implementing a block cipher algorithm, comprising:

acquiring plaintext data to be encrypted;

encrypting the plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, wherein a round function in the first block cipher algorithm is calculated by using an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation;

and outputting the ciphertext data.

2. The method according to claim 1, wherein the encrypting the plaintext data to be encrypted by using a first block cipher algorithm to obtain ciphertext data, and wherein the performing the round function calculation in the first block cipher algorithm by using the instruction set comprises:

obtaining input data of the nonlinear transformation, wherein the input data of the nonlinear transformation is determined according to a round key and the plaintext data, the input data of the nonlinear transformation is data in a first finite field, and the first finite field is a finite field of the first cipher block algorithm;

in a second finite field, implementing the nonlinear transformation on the input data of the nonlinear transformation by using the instruction set to obtain a result of the nonlinear transformation, wherein the result of the nonlinear transformation is data in the second finite field, the second finite field is a finite field of a second grouping cryptographic algorithm, the second finite field has an isomorphic relationship with the first finite field, and the first grouping cryptographic algorithm is different from the second grouping cryptographic algorithm;

obtaining the ciphertext data according to a result of the nonlinear transformation, the ciphertext data being data in the first finite field.

3. The method of claim 2, wherein the non-linear transformation comprises a first affine transformation and a first inverse affine transformation, and wherein implementing the non-linear transformation with the set of instructions in the second finite domain to obtain the result of the non-linear transformation comprises:

performing the first affine transformation on first data, second data and the input data of the non-linear transformation by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the first data is data for mapping a preset matrix into the second finite field according to an isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship;

and executing the first inverse affine transformation on the preset matrix, the preset vector and the first affine transformation result by using a second instruction in the instruction set to obtain a result of the nonlinear transformation.

4. The method of claim 2, wherein the non-linear transformation comprises a first affine transformation, a second affine transformation, and an inverse transformation, and wherein implementing the non-linear transformation with the instruction set in the second finite domain to obtain the result of the non-linear transformation comprises:

performing a first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the first data is data for mapping a preset matrix into the second finite field according to an isomorphic matrix, the second data is data for mapping a preset vector into the second finite field according to the isomorphic matrix, the isomorphic matrix is used for indicating the isomorphic relationship, and the mapping result of the input data is data for mapping the input data into the second finite field according to the isomorphic matrix;

performing the inverse transformation on the first affine transformation result by using a second instruction in the instruction set to obtain a result of the inverse transformation;

performing a second affine transformation on the first data, the second data, and the result of the inverse transformation using the first instruction, obtaining a result of the non-linear transformation.

5. The method according to any one of claims 2 to 4, wherein the round function further comprises a linear transformation, and the obtaining the ciphertext data according to the result of the non-linear transformation comprises:

performing the linear transformation on the result of the nonlinear transformation to obtain a result of the linear transformation;

performing an exclusive or operation on a result of the linear transformation and an internal state of the first block cipher algorithm, the internal state of the first block cipher algorithm being associated with the input data of the non-linear transformation, to obtain an output result of the round function;

and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of output results of round functions obtained by the last round of iterative computation in the first block cipher algorithm, determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the output results of the round functions obtained by any round of iterative computation are results obtained by mapping the output results of the round functions obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

6. The method of claim 5, wherein performing the linear transformation on the result of the non-linear transformation to obtain the result of the linear transformation comprises:

performing a cyclic shift operation on the result of the nonlinear transformation to obtain a cyclic shift operation result;

and performing the linear transformation on the cyclic shift operation result according to a preset matrix set by using the first instruction to obtain a linear transformation result, wherein the number of preset matrixes included in the preset matrix set is related to the cyclic shift operation result.

7. The method of claim 2, wherein the non-linear transformation comprises a first affine transformation and a first inverse affine transformation, and wherein implementing the non-linear transformation on input data of the non-linear transformation using the set of instructions in a second finite domain obtains a result of the non-linear transformation, comprising:

performing the first affine transformation on first data, second data and a mapping result of the input data by using a first instruction in the instruction set, to obtain the first affine transformation result, wherein the mapping result of the input data is data that maps the input data into the second finite field according to an isomorphic matrix, the first data is data that maps a preset matrix into the second finite field according to an isomorphic matrix, and the second data is data that maps a preset vector into the second finite field according to the isomorphic matrix, and the isomorphic matrix is used for indicating the isomorphic relationship;

performing a cyclic shift operation on the first affine transformation result to obtain a cyclic shift operation result;

performing the first inverse affine transformation on the cyclic shift operation result, third data and fourth data by using a second instruction in the instruction set, and obtaining a result of the nonlinear transformation, wherein the third data is data mapping a preset matrix set and a dot product result of the preset matrix into the second finite field, and the fourth data is data mapping the preset matrix set and the preset vector into the second finite field.

8. The method of claim 7, wherein obtaining the ciphertext data from the result of the non-linear transformation comprises:

performing an exclusive or operation on a result of the nonlinear transformation and an internal state of the first block cipher algorithm to obtain an output result of the round function, the internal state of the first block cipher algorithm being associated with input data of the nonlinear transformation;

and when the round key is a key used by the last round of iterative computation in the first block cipher algorithm, performing reverse order arrangement on mapping results of round function output results obtained by the last four rounds of iterative computation in the first block cipher algorithm, and determining the results obtained by the reverse order arrangement as the ciphertext data, wherein the mapping results of the round function output results obtained by any round of iterative computation are results obtained by mapping the round function output results obtained by any round of iterative computation to the first finite field by using the isomorphic matrix.

9. The method according to claim 4 or 7, characterized in that the method further comprises:

and mapping the input data of the nonlinear transformation to the second finite field by using a first instruction in the instruction set and the isomorphic matrix to obtain a mapping result of the input data.

10. The method according to any one of claims 2 to 4, characterized in that the first packet cryptographic algorithm is the national secret SM4 algorithm and the second packet cryptographic algorithm is the advanced data encryption Standard AES.

11. The method according to claim 3 or 4,

the instruction set is a GFNI instruction set, the first instruction is a VGF2P8AFFINeQB instruction, and the second instruction is a VGF2P8AFFINeINVQB instruction.

12. A method for implementing a block cipher algorithm, comprising:

acquiring ciphertext data to be decrypted;

decrypting the decrypted data to be decrypted by using a first block cipher algorithm to obtain plaintext data, wherein calculation of a round function in the first block cipher algorithm is realized by using an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation;

and outputting the plaintext data.

13. An apparatus for implementing a block cipher algorithm, comprising:

an acquisition unit configured to acquire plaintext data to be encrypted;

the processing unit is used for encrypting the plaintext data to be encrypted by utilizing a first block cipher algorithm to obtain ciphertext data, wherein the calculation of a round function in the first block cipher algorithm is realized by utilizing an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation;

and the output unit is used for outputting the ciphertext data.

14. An apparatus for implementing a block cipher algorithm, comprising:

the device comprises an acquisition unit, a decryption unit and a decryption unit, wherein the acquisition unit is used for acquiring ciphertext data to be decrypted;

the processing unit is used for carrying out decryption processing on the decrypted data to be decrypted by utilizing a first block cipher algorithm to obtain plaintext data, wherein the calculation of a round function in the first block cipher algorithm is realized by utilizing an instruction set, the round function comprises nonlinear transformation, and the instruction set comprises an instruction for solving the nonlinear transformation;

and the output unit is used for outputting the plaintext data.

15. A storage device storing program instructions executable by a processor to perform the method of any one of claims 1 to 12.

16. An electronic device comprising a processor, a memory, and a communication interface, the memory, the processor coupled with the communication interface, the memory to store computer program code, the computer program code comprising computer instructions, the processor to invoke the computer instructions to implement the method of any of claims 1 to 12.