CN117097455A

CN117097455A - Implementation method of SM4 cryptographic algorithm in Intel platform and encryption and decryption system

Info

Publication number: CN117097455A
Application number: CN202210517903.1A
Authority: CN
Inventors: 郭伟基
Original assignee: Shanghai Encryption Native Technology Co ltd
Current assignee: Shanghai Encryption Native Technology Co ltd
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2023-11-21

Abstract

The embodiment of the application discloses a method for realizing a national cryptographic algorithm SM4 on an Intel platform, which comprises multiple rounds of calculation, wherein each round of calculation comprises the following steps: obtaining first data according to a part of program states and a round key, wherein the program states are data to be encrypted or data to be decrypted initially; performing S-box table look-up operation on the first data by using a GFNI instruction set to obtain second data; and performing linear transformation on the second data to obtain third data, and integrating the third data into the program state. The embodiment of the application also discloses an S-box table look-up method, an encryption and decryption system, computer equipment and a computer storage medium. The technical scheme provided by the embodiment of the application can achieve the safe realization of the constant time of SM4 and improve the calculation performance.

Description

Implementation method of SM4 cryptographic algorithm in Intel platform and encryption and decryption system

Technical Field

The application relates to a communication security technology, in particular to an implementation method of a national encryption algorithm SM4 on an Intel platform, an encryption and decryption system, an S box table lookup method, computer equipment and a computer readable storage medium.

Background

The national encryption algorithm SM4 is a symmetric encryption algorithm specified by the national encryption standard for the application of commercial encryption, and the encryption calculation of the algorithm has 32 rounds of basic calculation. In each round of calculation, an S-Box table look-up operation is needed. In addition, the constant time implementation of the SM4 cryptographic algorithm can adopt a bit slicing or compound domain decomposition method if the implementation is realized by pure software; if the method is implemented in the assembly language, the table look-up operation of the S-Box needs to be solved or an alternative technology is implemented in the programming level of the assembly language. On part of the ARM64 platform, optional SM4 instructions are also specified, with which certain basic operations can be directly implemented.

However, no special SM4 instruction like the ARM64 platform exists on the intel platform, and the table lookup operation is difficult to realize.

Disclosure of Invention

The embodiment of the application aims to provide a method for realizing a cryptographic algorithm SM4 on an Intel platform, an encryption and decryption system, an S box table lookup method, computer equipment and a computer readable storage medium, which are used for solving the problems.

An aspect of the embodiment of the present application provides a method for implementing a cryptographic algorithm SM4 on an intel platform, where the method includes multiple rounds of computation, and each round of computation includes:

obtaining first data according to a part of program states and a round key, wherein the program states are data to be encrypted or data to be decrypted initially;

performing S-box table look-up operation on the first data by using a GFNI instruction set to obtain second data;

and performing linear transformation on the second data to obtain third data, and integrating the third data into the program state.

Optionally, the performing the S-box table look-up operation on the first data by using the GFNI instruction set, to obtain second data includes:

and decomposing the first data into a plurality of bytes, performing S-box table lookup on each byte through two preset instructions of the GFNI instruction set, and splicing table lookup results of the plurality of bytes into the second data.

Optionally, the performing the S-box lookup by two predetermined instructions of the GFNI instruction set includes:

affine transforming the first element to be checked into the Galois field of AES through a first preset instruction of GFNI to obtain a second element;

and carrying out multiplication inversion and affine transformation compound calculation on the second element through a second preset instruction of GFNI to obtain a table lookup result.

Optionally, the first predetermined instruction is a VGF2P8 affinetb instruction, and the second predetermined instruction is a VGF2P8 affinetvqb instruction.

Optionally, affine transforming the first element to be looked up to a galois field of AES by the first predetermined instruction of GFNI to obtain the second element includes:

considering the first element as an element of a Galois field of SM4, performing affine transformation on the first element by using a matrix represented by a parameter B1, and performing bitwise exclusive OR according to a column vector represented by a parameter D1 to obtain the second element of the Galois field of AES;

wherein the above-mentioned process is implemented by one of said first predetermined instructions.

Optionally, the performing the complex computation of multiplication inversion and affine transformation on the second element by the second predetermined instruction of GFNI to obtain a table look-up result includes:

solving multiplication inversion elements of the second element on a Galois field of AES, carrying out affine transformation on a multiplication inversion result by using a matrix represented by a parameter B2, and carrying out bitwise exclusive OR according to a column vector represented by a parameter D2 to obtain the table lookup result of the Galois field of SM 4;

wherein the above-mentioned process is implemented by one of said second predetermined instructions.

Optionally, the obtaining the first data according to the partial program state and the round key includes:

and performing exclusive OR calculation on the partial program state and the round key to obtain a calculation result as the first data.

Optionally, said integrating said third data into said program state comprises:

exclusive-or calculating the third data with another part of the program states except the part of the program states, and writing the exclusive-or result back to the other part of the program states.

An aspect of an embodiment of the present application further provides an encryption and decryption system, including:

the acquisition module is used for acquiring first data according to part of program states and round keys, wherein the program states are data to be encrypted or data to be decrypted initially;

the table look-up module is used for performing S-box table look-up operation on the first data by utilizing the GFNI instruction set to obtain second data;

and the integration module is used for carrying out linear transformation on the second data to obtain third data, and integrating the third data into the program state.

An aspect of the embodiment of the present application further provides a table look-up method for an S-box, including:

An aspect of the embodiments of the present application further provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when the computer program is executed.

An aspect of the embodiments of the present application further provides a computer-readable storage medium having stored thereon a computer program executable by at least one processor to cause the at least one processor to implement the steps of the method as described above.

The implementation method, encryption and decryption system, S box table look-up method, computer equipment and computer readable storage medium of the national encryption algorithm SM4 in the Intel platform provided by the embodiment of the application can comprise the following technical effects: the S-Box table lookup calculation of SM4 is completed by using the GFNI instruction set extension, the constant time safety realization is achieved, and the calculation performance of SM4 can be greatly improved.

Drawings

Fig. 1 schematically shows an application environment diagram of a method for implementing a cryptographic algorithm SM4 in an intel platform according to a first embodiment of the present application;

fig. 2 schematically shows a flowchart of a method for implementing the cryptographic algorithm SM4 according to the first embodiment of the present application on the intel platform;

fig. 3 schematically shows a flowchart of another form of implementation of the cryptographic algorithm SM4 on the intel platform according to the first embodiment of the present application;

FIG. 4 schematically illustrates a flow chart of a S-box look-up table method according to a second embodiment of the application;

FIG. 5 schematically illustrates a block diagram of an encryption and decryption system according to a third embodiment of the present application;

fig. 6 schematically illustrates a hardware architecture diagram of a computer device suitable for implementing the cryptographic algorithm SM4 in intel platform or S-box table lookup method according to the fourth embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the descriptions of "first," "second," etc. in the embodiments of the present application are for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions between the embodiments of each application may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the protection scope of the present application.

The following provides an explanation of terms involved in the present application:

CPU (Central Processing Unit ): is a component in a computer mainly responsible for completing various calculations. Modern CPUs have very fast computation speeds, mainly manifested by very high clock cycles per second.

Instruction (Instruction): a set of technical implementations for performing actual calculations in a CPU includes arithmetic calculation instructions, floating point calculation instructions, branch jump instructions, data load save instructions, vector instructions, and the like.

Instruction Set (Instruction Set): a related set of instructions may collectively be referred to as an instruction set. Common instruction sets are x86_64, ARM64, and the like.

Instruction set extension (Instruction Set Extension): in addition to the basic instruction set, many modern processors add special instructions that are suitable for a variety of different computing tasks to achieve the goal of fast computing certain data. These instructions are often divided into different instruction set extensions depending on the design purpose. The instruction set extensions to which the present application relates include, but are not limited to, AVX (Advanced Vector Extensions, advanced vector extension instruction set, also included in the present application are subsequent versions AVX2, AVX512, and possible future developments) and GFNI (Galois Field New Instructions, galois field new instruction).

SIMD (Single Instruction Multiple Data ): a technology specially designed for data-intensive computing tasks achieves the purpose of accelerating related computation by simultaneously computing a plurality of pieces of data by using a single instruction. Modern general purpose CPUs are typically provided with SIMD units or vector processors, which include specialized vector instructions, specialized vector registers. The vector register is different from the general register in that it has a large width and can store a plurality of pieces of data. For example, the general purpose registers of a 64-bit computer are typically 64 bits wide, while vector registers may be 128 bits, 256 bits, or even up to 512 or 2048 bits wide. The vector instruction is used for calculating the data stored in the vector register.

AVX: a set of instruction set extensions proposed by intel corporation provide SIMD computing power. The maximum current 512-bit data can be operated, and longer data can be possibly supported in the future.

GFNI: one instruction set extension proposed by intel corporation can quickly calculate multiplicative inversions, affine transformations, etc. over a particular galois field. GFNI provides SIMD computing power. The VGF2P8 affinetb instruction of GFNI may calculate affine transformation and the VGF2P8 affinetqb instruction may calculate multiplication inversion and affine transformation simultaneously, where the multiplication inversion is calculated as a galois field for AES. Affine transformation, multiplication computation and multiplication inversion computation on a galois field have a certain complexity, and the correlation computation can be greatly accelerated using special hardware instructions.

Galois Field (Galois Field): mathematically, a mathematical structure, elements on a domain may perform an addition or multiplication according to a certain rule, which may be different from a common rule. The S-Box structure of a part of the symmetric encryption algorithm is constructed based on a specific Galois field, and comprises a national encryption algorithm SM4 and an international standard algorithm AES (Advanced Encryption Standard, advanced encryption standard algorithm, which is an international standard symmetric encryption algorithm). The Galois field used by the two algorithms is defined in the range of 0-255 (the value range is just one byte, namely, all integers which can be continuously expressed by eight bits), namely GF (256), and the addition is binary addition without carry, and can be realized by using bitwise exclusive OR operation; the rule of multiplication is more complex, in general, one byte can be interpreted as a polynomial of seven times at the highest, each bit corresponds to a coefficient, so that the multiplication of two field elements is the multiplication of two polynomials, but the addition definition is adopted when the similar items are combined, and the carry is not calculated (namely, bitwise exclusive OR); calculating the multiplication results in a polynomial of up to 14 th degree, which needs to be divided by a certain polynomial (called a modulus) and the remaining polynomial is taken as the final result. If the product of two elements is 1, the two elements are mutually called as multiplication inversion elements of each other, and the multiplication inversion element for solving the element can be called as multiplication inversion. SM4 and AES use different modes, and can be converted to each other mathematically by a certain rule.

S Box (S-Box): one structure of the SM4 algorithm, which is embodied in a table of 256 elements in standard text, can be found using index values to complete the standard specified calculations.

ARM64: one CPU specification proposed by ARM company is a 64-bit general purpose processor, and has 128-bit SIMD processing units.

Constant time implementation: one implementation technique of cryptographic algorithms. In some cases, implementation of a specific cryptographic algorithm may cause leakage of secret information through side channels (such as execution time of computation, access time of specific data, consumed power supply and voltage fluctuation of power supply generated thereby, variation pattern of electromagnetic wave radiation generated during computation), with serious consequences of related secret being broken. For example, S-Box look-up table implementations of AES or SM4 have side channel leakage potential. The constant time is realized as a targeted protection technology, and the purpose of preventing side channel leakage is achieved by enabling observable influences such as the running time of a program to be independent of secret data.

Bit slicing: a software technique compiles a computational task into a hardware circuit of some kind and simulates the circuit using software. In the implementation of the cryptographic algorithm, the aim of realizing constant time can be achieved by using a bit slicing technology, but the calculation amount is generally large.

Composite domain decomposition: the galois field GF (256) may be further broken down into some combination of smaller fields. The decomposition may be followed by completion of the corresponding computation in a smaller domain and combining the results into data over GF (256), which may also be a constant time implementation technique, but with a larger computation.

Affine transformation (Affine Transformation): a mathematical transformation is capable of maintaining a linear and parallel relationship of geometric objects. In the present application, a conversion relationship between the galois field of SM4 and the galois field of AES is provided.

The constant time implementation of the SM4 cryptographic algorithm can adopt a bit slicing or compound domain decomposition method if the implementation is realized by pure software; if the method is implemented in the assembly language, the table look-up operation of the S-Box needs to be solved or an alternative technology is implemented in the programming level of the assembly language.

The design of the cryptographic algorithm SM4 (and the general design of the same type of algorithm) makes it difficult to implement the calculation of the constant time by a high-level language. The calculation of the S-Box requires a special technology, whether a bit slicing technology, a composite domain decomposition technology or other technologies, is not only obscure, but also often causes serious performance degradation and is not easy to popularize, so that a large number of unsafe implementations are widely deployed in various technical projects of national lives of relatives and countries, and potential safety hazards are caused.

Currently, NEON extended instruction sets can be used on an ARM64 platform, and S-Box calculation is achieved in a register table look-up mode and the like. The partial architecture of ARM64 also provides an optional SM 4-specific instruction set extension that can directly accomplish one or more rounds of related computations.

However, on the intel platform, there is currently no special SM4 instruction like ARM64, and it is also difficult to implement a lookup operation. Markku-Juhani O.Saarinen in 2018 proposed a method to transform the Galois field data of SM4 to the Galois field of AES, then use the AESNI instruction set of Intel platform to expand and complete the operation of S-Box, and then transform the data from the Galois field of AES back to the Galois field of SM4 to achieve the purpose of fast computing S-Box. However, the method still has a large defect, namely the method is not promoted in a large range, the operation amount is still large, and particularly the implementation of the S-Box related part still needs more than ten instructions.

In view of this, in order to more efficiently implement the constant time SM4 algorithm on the intel platform, achieve better security, and more fully exert the calculation performance of the CPU, the embodiment of the present application provides an implementation scheme of the national encryption algorithm SM4 on the intel platform. After affine transformation, the data of the galois field of SM4 is transformed to the galois field of AES (VGF 2P8AFFINEQB instruction of GFNI may be used), VGF2P8 affinevqb instruction of GFNI may be reused to complete multiplication inversion and inverse affine transformation simultaneously, and the data is transformed back to the galois field of SM4, thereby completing the galois field multiplication inversion operation of the core required by S-Box and achieving the purpose of constant time realization.

The implementation of the cryptographic algorithm SM4 on the intel platform will be described by various embodiments.

In the description of the present application, it should be understood that the numerical references before the steps do not identify the order in which the steps are performed, but are merely used to facilitate description of the present application and to distinguish between each step, and thus should not be construed as limiting the present application.

Examples one and two

Fig. 1 schematically shows an environmental application schematic according to a first embodiment of the present application.

The computer device 10000 may be a server or a terminal device such as a smart phone, a tablet device, a PC (personal computer), or the like. The computer device 10000 is used for performing various calculations (mainly by CPU processing in the computer device 10000) according to instructions. In this embodiment, the computer device 10000 is mainly used for performing encryption calculation on data (plaintext) to be encrypted to obtain encrypted data (ciphertext). Wherein the encryption is realized by adopting a national encryption algorithm SM 4. Of course, in other embodiments, the computer device 10000 can also perform corresponding decryption computation on the encrypted data (ciphertext) to obtain data to be encrypted (plaintext).

Fig. 2 schematically shows a flowchart of a method for implementing the cryptographic algorithm SM4 according to the first embodiment of the present application on the intel platform. In this embodiment, the encryption operation of the cryptographic algorithm SM4 is 32 rounds of computation, and in each round of computation, steps S200 to S204 may be included, where:

step S200, obtaining first data according to the partial program state and the round key.

SM4 is a block cipher algorithm with a packet length of 128 bits (i.e., 16 bytes, 4 words) and a key length of 128 bits (i.e., 16 bytes, 4 words). The encryption and decryption process of SM4 adopts a 32-round iteration mechanism, and each round needs a round key. Typically, round keys are derived from encryption key expansion, which is different from round to round. In the encryption process, the program state is initially 128 bits (32 bits×4) of data to be encrypted (data to be decrypted in the decryption process).

In each round of computation, it is first necessary to exclusive-or the partial program state with the round key to obtain 32-bit data (first data).

For example, a 128-bit program state is recorded as four 32-bit sub-states: z0, Z1, Z2, Z3, the round key is noted as RKi (where i is the round and 0 represents the first round). In the first round, the partial program states are Z1, Z2, and Z3, t=z1 × z2 × z3 × RK0 needs to be calculated, and the obtained t is 32-bit data, i.e. the first data, where × is the bitwise exclusive or calculation symbol.

This step is an existing calculation step of the SM4 encryption process, and will not be described in detail here.

Step S202, performing S-Box table look-up operation on the first data by using a GFNI instruction set to obtain second data.

Specifically, the calculation result (first data) of the previous step is decomposed into 4 bytes (8 bits per byte), an S-Box lookup table is performed for each byte, and the lookup result (of 4 bytes) is spliced into one 32-bit data (second data). In this embodiment, in order to more efficiently implement the constant time SM4 algorithm on the intel platform, achieve better security, and more fully exert the computation performance of the CPU, the S-Box table lookup process is implemented using the GFNI instruction set.

The SM4 algorithm specification provides that the S-Box is a 256-element table with 16 rows and 16 columns, and the corresponding values can be found by respectively indexing the row number and the column number according to the front half part and the rear half part of the index value (1 byte). Meanwhile, mathematically, the S-Box of SM4 has been broken down, which can be expressed as the following calculation of the index value x:

wherein, A is a certain 8x8 matrix, and each element takes a value of 0 or 1 (hereinafter referred to as 8x8 matrix on F2); c is a certain eight-bit number (hereinafter, referred to as an 8-dimensional column vector on F2). In the above-mentioned calculation formula, the calculation formula,representing affine transformation of x using a matrix represented by a, inverse is performed on the galois field of SM4Line multiplication inversion>Is bitwise exclusive or. Wherein the modulus of the Galois field of SM4 is x ⁸ +x ⁷ +x ⁶ +x ⁵ +x ⁴ +x ² +1。

Using an affine transformation represented by a certain A1 value, elements of the galois field of SM4 can be transformed into elements of the galois field of AES. The corresponding transformation formula is:

while the corresponding inverse transform (i.e., transforming the elements of the galois field of AES to those of SM 4) is represented by the A2 value and uniquely determined by the A1 value. Wherein, A1 and A2 are also 8x8 matrix on F2, A1 has 8 possible values in total, and A2 has 8 possible values correspondingly. The transformation formula corresponding to the inverse transformation is:

the VGF2P8AFFINEQB instruction of the GFNI instruction set can complete affine transformation of a plurality of pieces of data once, and the corresponding formula is as follows:

the VGF2P8AFFINEINVQB instruction can complete complex calculation of one-time multiplication inversion and affine transformation on a plurality of pieces of data, and the corresponding formula is as follows:

where the multiplication inversion is performed over the galois field of AES.

In the two formulas, B (B1, B2) is a certain 8x8 matrix on F2, D (D1, D2) is a certain 8-dimensional column vector on F2, and all parameters are provided by an instruction caller. Also, transformation parameters B1 and D1 may be determined by A, C, A1, and transformation parameters B2 and D2 may be determined by A, A, C. In addition to A, C being a well-known number, A1, A2 are numbers calculated according to the present application (eight pairs in total, any pair being possible) based on the conversion relationship of the galois field used by SM4 and the galois field used by AES.

Therefore, in this embodiment, only two instructions can complete the calculation of the S-Box table look-up equivalent to SM4 for a plurality of pieces of data, so as to obtain the second data.

Step S204, performing linear transformation on the second data to obtain third data, and integrating the third data into the program state.

Specifically, the resulting second data is subjected to linear transformation once, and the result is still 32-bit data (third data). The resulting result (the third data) is then integrated into the program state. The integrating means exclusive-or the third data with another part of the program state and writing the exclusive-or result back to the other part of the program state.

For example, it is assumed that a third data u ' (32-bit data) is obtained after linear transformation, then Z0 is xored with u ', and the result is written back to Z0, i.e. z0=z0 ++u ' is calculated, so that the third data can be integrated into the program state.

Notably, the above is for the first round of computation, where Z0 in the program state is processed. In the second round, Z1 needs to be processed. And so on, and after four rounds are completed, Z0 is processed again from scratch until the 32 rounds of calculation are completed.

After the completion of the 32 rounds of calculation according to the steps, the program states are properly arranged, and the required ciphertext can be obtained. The decryption calculation of SM4 is substantially the same as the encryption calculation, except that the order of use of round keys is reversed.

Fig. 3 schematically shows a flowchart of another form of implementation of the cryptographic algorithm SM4 in the intel platform according to the first embodiment of the present application, which shows a data calculation procedure in each round of calculation of the cryptographic operation SM 4. In each round of computation shown in fig. 3, a partial program state is xored with a round key, the computation result is decomposed into 4 bytes (8 bits per byte), S-Box table lookup is performed by two instructions (VGF 2P8AFFINEQB instruction and VGF2P8 affinevqb instruction) respectively, and the table lookup result is spliced into one 32-bit data, linear transformation is performed again, and finally integrated into the program state (xored with another partial program state and writing the xored result back to the other partial program state).

In an exemplary embodiment, S-Box calculation u=sm4box (x) of SM4 may be implemented using GFNI instructions. Fig. 4 schematically shows a flowchart of an S-box look-up table method according to a second embodiment of the application. In this embodiment, the method is applicable to the S-box table look-up calculation process (corresponding to the partial sub-flow of step S202 described above) in each round of calculation of the encryption operation of the cryptographic algorithm SM4, and may include steps S400-S402, where:

in step S400, the element x to be checked is affine transformed to the galois field of AES by the first predetermined instruction of GFNI to obtain the element z.

In this embodiment, the first predetermined instruction is a VGF2P8 affineyb instruction.

The transformation process first considers the element to be looked up (index value) x as an element of the galois field of SM4 and completes an affine transformation:the element y is obtained. Secondly, the obtained element y is subjected to affine transformation again:the conversion to the galois field of AES yields element z. The above processes can be mathematically combined into one affine transformation, which in this embodiment is implemented using VGF2P8 affinetb instructions for GFNI, the transformation formula is:

wherein the transformation parameters B1 and D1 are determined by A, C, A1, which can be regarded as

For example, the transformation parameters B1 and D1 commonly determined by A, C, A1 may be (a total of eight possibilities, only one of which is exemplified here):

step S402, performing multiplication inversion and affine transformation compound calculation on the element z through a second preset instruction of GFNI to obtain a final table look-up result u.

In this embodiment, the second predetermined instruction is a VGF2P8 affinine vqb instruction.

The complex computation first requires multiplying the resulting element z by an inverse of the multiplication over the Galois field of AES, and then converting the element into the Galois field of SM4 using an affine transformation to obtain the elementThen subjecting the element w to affine transformation again to obtain the final result +.>The two affine transformations in the above procedure may be mathematically combined into one. In this embodiment, the complex computation is implemented using VGF2P8aff inevqb instructions of GFNI, and multiplication inversion and combined affine transformation can be completed simultaneously by one instruction. The transformation formula is:

wherein the transformation parameters B2 and D2 are determined by A, A, C,can be regarded asD2＝C。

For example, the transformation parameters B2 and D2 commonly determined by A, A, C may be (there are eight possibilities and corresponding to the foregoing B1, D1, only one of which is listed here):

although the conversion relationship between the galois field of SM4 and the galois field of AES necessarily exists in the SM4 encryption and decryption scheme, the calculation and derivation of its specific conversion parameters is not obvious. The AESNI method proposed by Markku-Juhani O.Saarinen, whose conversion parameters are determined for AESNI instruction set extensions, cannot be directly utilized in this embodiment. The reason is that the AESNI instruction it uses will additionally calculate some affine transformation required for AES, which transformation needs to be cancelled in the inverse transformation parameters. Therefore, the above-mentioned various transformation parameters proposed by the present embodiment have effective applicability and important significance for the SM4 encryption and decryption scheme based on the intel platform.

The encryption method described in the first embodiment and the S-box table lookup method described in the second embodiment have at least the following technical effects:

the method comprises the following steps: the S-Box table lookup operation of SM4 is completed by using GFNI instruction set expansion, after VGF2P8AFFINEQB instruction of GFNI is used for converting the data of the Galois field of SM4 to the Galois field of AES, VGF2P8AFFINEINVQB instruction is used for simultaneously completing multiplication inversion and inverse affine conversion, and the data is converted back to the Galois field of SM4, so that the Galois field multiplication inversion operation of a core required by S-Box is completed, and constant time safety realization is achieved. In addition, the method can finish S-Box related calculation of SM4 by only two instructions, and can calculate the highest 512-bit data, wherein each data block is 32-bit data, so 16 data blocks can be calculated in parallel, and SM4 encryption and decryption calculation performance can be greatly improved.

And two,: the calculation performance is greatly improved by the technology adopted by the method, so that each project party and business party applying the national secret technology do not need to discard the security and transfer the performance, namely the security of a system and a project adopting the national secret algorithm is further promoted in the practical application through the technical innovation.

Example III

Fig. 5 schematically shows a block diagram of an encryption and decryption system according to a third embodiment of the present application, which may be divided into one or more program modules, one or more program modules being stored in a storage medium and executed by one or more processors to complete the embodiment of the present application. Program modules in accordance with the embodiments of the present application are directed to a series of computer program instruction segments capable of performing the specified functions, and the following description describes each program module in detail.

In this embodiment, the encryption and decryption system is an implementation system of the cryptographic algorithm SM4 in the intel platform, and is applied to each round of computation of the cryptographic operation of SM 4.

As shown in fig. 5, the encryption and decryption system 500 may include an acquisition module 510, a table lookup module 520, and an integration module 530, where:

an obtaining module 510, configured to obtain first data according to the partial program state and the round key (performing exclusive-or calculation on the partial program state and the round key);

the table lookup module 520 is configured to perform an S-Box table lookup operation on the first data by using a GFNI instruction set to obtain second data;

and an integrating module 530, configured to perform linear transformation on the second data to obtain third data, and integrate the third data into the program state.

As an alternative embodiment, the table lookup module 520 may further specifically include a decomposition submodule 5200, a first instruction submodule 5210, a second instruction submodule 5220, and a stitching submodule 5230, where:

a decomposition sub-module 5200 for decomposing the first data into a plurality of (4) bytes. A separate look-up table is then required for each byte.

The first instruction submodule 5210 is configured to affine-transform the element x to be looked up into a galois field of AES by a first predetermined instruction of GFNI to obtain the element z.

The second instruction submodule 5220 is configured to perform complex computation of multiplication inversion and affine transformation on the element z through a second predetermined instruction of GFNI, so as to obtain a final table look-up result u.

And a stitching sub-module 5230, configured to stitch the table lookup result of the plurality of bytes (4 bytes) into the second data (32-bit data).

The specific functions of the above modules are referred to in the first and second embodiments, and are not described herein.

Example IV

Fig. 6 schematically illustrates a hardware architecture diagram of a computer device suitable for implementing the cryptographic algorithm SM4 in intel platform or S-box table lookup method according to the fourth embodiment of the present application. In this embodiment, the computer device 10000 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, for example, a terminal device such as a server, a smart phone, a tablet computer, a vehicle-mounted terminal, a game machine, or a virtual device.

As shown in fig. 6, computer device 10000 includes at least, but is not limited to: the memory 10010, processor 10020, network interface 10030 may be communicatively linked to each other via a system bus. Wherein:

memory 10010 includes at least one type of computer-readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, memory 10010 may be an internal storage module of computer device 10000, such as a hard disk or memory of computer device 10000. In other embodiments, the memory 10010 may also be an external storage device of the computer device 10000, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like. Of course, the memory 10010 may also include both an internal memory module of the computer device 10000 and an external memory device thereof. In this embodiment, the memory 10010 is generally used for storing an operating system and various application software installed on the computer device 10000, for example, a program code of the implementation method of the cryptographic algorithm SM4 on the intel platform or the S-box table look-up method. In addition, the memory 10010 may be used to temporarily store various types of data that have been output or are to be output.

The processor 10020 may be a CPU, controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 10020 is typically configured to control overall operation of the computer device 10000, such as performing control and processing related to data interaction or communication with the computer device 10000. In this embodiment, the processor 10020 is configured to execute program codes or process data stored in the memory 10010.

The network interface 10030 may comprise a wireless network interface or a wired network interface, which network interface 10030 is typically used to establish a communication link between the computer device 10000 and other computer devices. For example, the network interface 10030 is used to connect the computer device 10000 to an external terminal through a network, establish a data transmission channel and a communication link between the computer device 10000 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, abbreviated as GSM), wideband code division multiple access (Wideband Code Division Multiple Access, abbreviated as WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, etc.

It should be noted that fig. 6 only shows a computer device having components 10010-10030, but it should be understood that not all of the illustrated components are required to be implemented, and more or fewer components may be implemented instead.

In this embodiment, the encryption method stored in the memory 10010 may be further divided into one or more program modules and executed by one or more processors (the processor 10020 in this embodiment) to complete the present application.

Example five

The present embodiment also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the implementation method or S-box table look-up method of the cryptographic algorithm SM4 in the intel platform in the embodiment.

In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may also be an external storage device of a computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), etc. that are provided on the computer device. Of course, the computer-readable storage medium may also include both internal storage units of a computer device and external storage devices. In this embodiment, the computer-readable storage medium is typically used to store an operating system installed on a computer device and various types of application software, such as program codes of the encryption method in the embodiment, and the like. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the application are not limited to any specific combination of hardware and software.

The foregoing description of the preferred embodiments of the present application should not be taken as limiting the scope of the application, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the scope of the application as defined by the following description and drawings, or by direct or indirect application to other relevant art(s).

Claims

1. A method for implementing a cryptographic algorithm SM4 on an intel platform, the method comprising a plurality of rounds of computation, wherein each round of computation comprises:

2. The method of claim 1, wherein performing an S-box look-up operation on the first data using a GFNI instruction set to obtain second data comprises:

3. The method of claim 2, wherein S-box look-up by two predetermined instructions of the GFNI instruction set comprises:

4. A method according to claim 3, wherein the first predetermined instruction is a VGF2P8 affinetb instruction and the second predetermined instruction is a VGF2P8 affinetqb instruction.

5. The method of claim 3 or 4, wherein affine transforming the first element to be looked up to the galois field of AES by the first predetermined instruction of GFNI to obtain the second element comprises:

6. The method of any one of claims 3 to 5, wherein the complex computation of the multiplicative inverse, affine transformation of the second element by the second predetermined instruction of GFNI, obtaining a look-up table result comprises:

7. The method of any one of claims 1 to 6, wherein obtaining the first data based on the partial program state and the round key comprises:

8. The method of any of claims 1 to 7, wherein said integrating said third data into said program state comprises:

9. An encryption and decryption system, comprising:

10. An S-box look-up table method, comprising:

11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 or claim 10 when the computer program is executed by the processor.

12. A computer readable storage medium having a computer program stored thereon, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of the method of any of claims 1 to 8 or claim 10.