CN111736902B - Parallel computing method and device of SM4 based on SIMD (Single instruction multiple data) instructions and readable storage medium - Google Patents
Parallel computing method and device of SM4 based on SIMD (Single instruction multiple data) instructions and readable storage medium Download PDFInfo
- Publication number
- CN111736902B CN111736902B CN202010687106.9A CN202010687106A CN111736902B CN 111736902 B CN111736902 B CN 111736902B CN 202010687106 A CN202010687106 A CN 202010687106A CN 111736902 B CN111736902 B CN 111736902B
- Authority
- CN
- China
- Prior art keywords
- simd
- sbox
- calculation
- instructions
- message
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 claims abstract description 80
- 230000008569 process Effects 0.000 claims abstract description 51
- 230000009466 transformation Effects 0.000 claims abstract description 40
- 238000000844 transformation Methods 0.000 claims abstract description 4
- 238000006467 substitution reaction Methods 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 13
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 abstract description 33
- 239000002131 composite material Substances 0.000 abstract description 6
- 230000004927 fusion Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 7
- 230000007547 defect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 241000209507 Camellia Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 235000018597 common camellia Nutrition 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention provides a parallel computing method and device of SM4 based on SIMD instruction, and readable storage medium, comprising: arranging a plurality of input SM4 grouped messages to obtain arranged grouped messages; performing an optimized SM4 encryption or decryption operation process on the arranged packet message; the Sbox operation in the encryption or decryption operation process is calculated by replacing a table look-up method with a composite domain technology; performing inversion operation in GF (2^8) by using fast multiplication operation in GF (2^4) based on SIMD instructions for the complex domain technology; and performing reverse arrangement calculation on the encryption or decryption operation process to obtain the ciphertext or plaintext message corresponding to the grouped message. The method utilizes the composite domain technology to perform the process of SM4 algorithm nonlinear operation equivalent transformation, utilizes operation sequence adjustment and fusion of a plurality of linear transformations to perform SM4 algorithm linear operation equivalent transformation process and fast multiplication operation in GF (2^4) based on SIMD instructions, and improves the execution speed of SM4 encryption and decryption process.
Description
Technical Field
The invention relates to the technical field of computer security, in particular to a parallel computing method, a parallel computing device and a computer readable storage medium of SM4 based on SIMD instructions.
Background
To ensure the security of data encryption operations, corresponding standard algorithms have been introduced in countries around the world, such as the AES algorithm in the united states, the CLEFIA and Camellia algorithms in japan and the SM4 in china, formerly also referred to as the SMs4 algorithm.
The SM4 cryptographic algorithm is constructed based on a 4-branch generalized Feistel structure, plaintext and ciphertext and a key are 128 bits, and the SM4 cryptographic algorithm comprises an encryption algorithm, a decryption algorithm and a key arrangement algorithm, wherein the key arrangement algorithm is 128-bitThe encryption algorithm and the decryption algorithm both comprise the same 32-round nonlinear round function and 1-time reverse order transformation R, and the difference between the 32-round nonlinear round function and the 1-time reverse order transformation R is the use sequence of the 32-round keys, 4 variables with 32 bits are usedRepresenting a plaintext input of 128 bits, the operation process of the encryption algorithm is as follows:
1. performing 32 iterative operations, RKiFor round keys:
t is composed of two parts of non-linear transformation tau and linear transformation L (U) L (tau (U)), and is used To represent4 bytes, the nonlinear transformation is represented as:
V=(v0,v1,v2,v3)=τ(U)=(Sbox(u0),Sbox(u1),Sbox(u2),Sbox(u3))
and the linear transformation L is represented as:
2. and performing reverse order transformation R on the last round of data to obtain a ciphertext:
(Y0,Y1,Y2,Y3)=R(X32,X33,X34,X35)=(X35,X34,X33,X32)
the popularization of SM4 algorithm is promoted by implementing a domestic network security method and a cryptographic method, but the reduction of the transaction processing speed of a computer information system introduced by encryption and decryption becomes an obstacle to the popularization of SM4 algorithm.
Hardware implementation improves the encryption and decryption efficiency of the SM4 algorithm by continuously optimizing the number of gates required for realizing the SM4 algorithm, wherein the key technology is to use a composite domain technology to enable the Sbox of the SM4 algorithm to be GF (2)8) Inner nonlinear inversion arithmetic equivalent transformation to GF (2)4)2And further applying complex domain techniques until the operation is converted to an operation in GF (2) that can be gated.
There are currently 4 main technical approaches for software implementation, including GPU, SM4 hardware instructions, AESNI and bit slicing (bitscle):
1. GPU: the SM4 encryption and decryption efficiency is improved by using the parallel capability of the GPU;
2. SM4 hardware instruction: constructing a hardware instruction supporting an SM4 encryption and decryption algorithm in a CPU;
3. AESNI: based on the algebraic isomorphic characteristics of the Sbox of the AES algorithm and the Sbox of the SM4 algorithm, completing Sbox operation of SM4 by using an AESNI instruction AESENCCLAST; transforming the SM4 algorithm into an algebraic structure of AES from the algebraic structure of SM 4;
4. bitslice: 256 data packets are processed simultaneously using bitsell technology and the 256-bit registers of the AVX2 instruction;
the technical defects of the schemes are as follows: the GPU and SM4 hardware instructions are not a general solution, AESNI depends on the existence of AESNI hardware instructions and only AESNI instructions supporting 128-bit registers are available, when the AESNI instructions are matched with the SIMD instructions with 256-bit registers, operations related to the AESNI instructions need to be serialized, Bitslice needs to process 256 data packets (4096 bytes) at the same time, and the data arrangement steps and GF (2 bytes) involved in the Bitslice technology are limited in applicability4) The multiplication operations above are all complex.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a parallel computing method of SM4 based on SIMD instructions, which is used for arranging a plurality of input grouped messages to obtain arranged grouped messages; for the arranged packet message, based on a SIMD instruction, GF (2^4) is used for replacing GF (2^8) to complete Sbox substitution calculation of SM4 so as to realize inversion operation on a wheel function and obtain the result of the inversion operation; and performing reverse arrangement calculation on the result of the inversion operation to obtain the ciphertext message corresponding to the grouped message, and overcoming various defects of the traditional method.
The specific scheme of the invention is as follows:
a method of SM4 parallel computation based on SIMD instructions, comprising the steps of:
step S1: arranging a plurality of input grouped messages to obtain arranged grouped messages;
step S2: using GF (2) based on SIMD instructions for the composed packet message4) Substitution of GF (2)8) Completing Sbox substitution calculation of SM4 to realize inversion operation of the wheel function and obtain the result of the inversion operation;
the SIMD instruction based usage GF (2)4) Substitution of GF (2)8) The Sbox substitution calculation procedure to complete SM4 is as follows:
the Sbox substitution calculation process substitutes an 8-bit input with an 8-bit output Sbox (u) according to a substitution table of 256 bytes:
Sbox(u)=A(A·u+C)-1+C
where a is a matrix of 8x8, C is a matrix of 8x1, u represents an 8bit number, and each element in matrices a and C is an element in GF (2):
the mathematical structure of Sbox defining SM4 is: GF (2)8),f(x)=1+x2+x4+x5+x6+ x7+x8;
The finite field is defined as: GF (2)4),g(x)=1+x+x4;
The mathematical structure defining Sbox is isomorphic with a quadratic algebraic expansion of finite fields:
defining an isomorphic mapping matrix M and an inverse isomorphic mapping matrix M-1The values of (A) are:
thus, let v ∈ GF (2)8) Written as v ═ a0+a1y, wherein a0,a1∈GF(24) Due to v-1∈ GF(28) V is to be-1Is shown as v-1=b0+b1y,b0,b1∈GF(24) Then according to (a)0+a1y)(b0+ b1y) 1 has:
i.e. GF (2)8) The inner inversion operation can be at GF (2)4)2The calculation process is as follows:
Wherein u is GF (2)8) U is affine transformed to obtain GF (2)8) V in (1), and isomorphic mapping of v to GF (2)4)2Then, the complex domain inversion operation is carried out on w to obtain GF (2)4)2Element w of (5)-1Then to w-1Isomorphic mapping to GF (2)8) S, and finally affine transformation is carried out on s to obtain GF (2)8) Element (ii) t, w-1Inv (w) denotes GF (2)4)2The inversion operation in (1);
step S3: and performing reverse arrangement calculation on the result of the inversion operation to obtain the ciphertext message corresponding to the grouped message.
Further, for GF (2)4)2GF (2) in the inversion operation in (1)4) The multiplication process is as follows: with GF (2)4)*Denotes GF (2)4) G 0x02 e GF (2)4)*Is GF (2)4)*Is the generator of, i.e. all e e.g. GF (2)4)*I.ltoreq.15, such that e ═ giIn order to calculate c ═ a · b, a, b, c ∈ bGF(24) First, calculate loggc=logga+loggb, then calculatingWherein logga,loggb can be done by looking up a LOG lookup table containing 16 elements.
Furthermore, the calculation process of the linear transformation L in the round function is adjusted by using the characteristics of the linear transformation L and the characteristics of the SIMD instruction set, and is as follows:
the linear transformation L in the round function is defined as:
the linear transformation L is equivalently transformed into:
the calculation only needs 4 bits of XOR, 3 shaffles, 1 left shift, 1 right shift, 1 bit or 10 SIMD instructions in total.
Further, the inverse transform R computation in SM4 is incorporated into the message de-marshalling computation process.
The present invention provides a SIMD instruction based SM4 parallel computing apparatus comprising a processor and a memory, the memory storing a computer program which, when executed by the processor, implements the method of any one of the above.
The invention also proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the above.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a parallel computing method of SM4 based on SIMD instruction, a device thereof and a computer readable storage medium, the method comprises: arranging a plurality of input SM4 grouped messages to obtain arranged grouped messages; performing an optimized SM4 encryption or decryption operation process on the arranged packet message; performing calculation on Sbox operation in the encryption or decryption operation process by using a composite domain technology to replace a table look-up method; for the composite domain technology, the fast multiplication operation in GF (2^4) based on the SIMD instruction provided by the invention is utilized to complete the inversion operation in GF (2^ 8); and performing reverse arrangement calculation on the encryption or decryption operation process to obtain a ciphertext or plaintext message corresponding to the grouped message. The invention provides a new execution process of SM4 block cipher algorithm with good height with SIMD instruction set by using the process of SM4 algorithm nonlinear operation equivalent transformation by using composite domain technique, the process of SM4 algorithm linear operation equivalent transformation by using operation sequence adjustment and fusion of multiple linear transformations, and the new proposed fast multiplication operation in GF (2^4) based on SIMD instruction, which can realize parallel encryption and decryption processing of 4, 8, 16, 32 or more SM4 block messages, improve the execution speed of SM4 encryption and decryption process, and the method is independent of specific hardware platform and can be realized on any hardware supporting SIMD instruction.
Compared with the existing method for accelerating the SM4 calculation process based on the AESNI instruction, the method provided by the invention does not depend on the specific AESNI instruction, and only depends on the more general SIMD instruction. Compared with the existing method of SM4 calculation process based on Bitslice technology and AVX2 instructions, the method provided by the invention has better universality, supports parallel processing of 4, 8, 16 and 32 groups, and is more suitable for use scenes. The SM4 calculation process, which is based on the bitsolice technology and AVX2 instructions, requires that the simultaneous processing of 512 SM4 packets is limited in use scenarios. In addition, the SM4 calculation process provided by the invention doubles the processing speed along with the number of messages processed in parallel in a test in a real environment. And when 16 or 32 packets are processed in parallel, the actual measurement speed is 20 to 30 percent faster than the actual measurement speed of the AESNI and Bitslice methods
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.
FIG. 1 is a flow chart of a parallel computing method of SM4 based on SIMD instructions according to the present invention;
FIG. 2 is a schematic diagram of an exemplary editing process;
FIG. 3 is a schematic diagram of an inversion operation process; and
FIG. 4 is a diagram illustrating a reverse arrangement process according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention aims to provide a parallel computing method of SM4 based on SIMD instructions, which comprises the following steps:
step S1: and editing the input plurality of grouped messages to obtain edited grouped messages.
The method of the present invention can process multiple groups of messages in parallel, and the following takes the implementation of SIMD instruction based on 128 bits as an example, and shows the parallel processing of 4 groups of messages as an example, and shows the message arrangement process, see fig. 2, whereini is 0, 1, 2,3 are all 32 bits, and 4 groups of 128-bit messages are:
(A0,A1,A2,A3),(B0,B1,B2,B3),(C0,C1,C2,C3),(D1,D2,D3,D4);
after the message layout, there are 4 128-bit registers for storing the message layout, where the contents stored in the 4 registers are:
(A0,B0,C0,D0),(A1,B1,C1,D1),(A2,B2,C2,D2),(A3,B3,C3,D3)。
step S2: using GF (2) for choreographed packet messages based on SIMD instructions4) Substitution of GF (2)8) Completing Sbox substitution calculation of SM4 to realize inversion operation of the wheel function and obtain the result of the inversion operation;
wherein GF (2) is used based on SIMD instructions4) Substitution of GF (2)8) The Sbox substitution calculation procedure to complete SM4 is as follows:
the Sbox substitution calculation process substitutes an 8-bit input with an 8-bit output Sbox (u) according to a substitution table of 256 bytes:
Sbox(u)=A(A·u+C)-1+C
where a is a matrix of 8x8, C is a matrix of 8x1, u represents an 8bit number, and each element in matrices a and C is an element in GF (2):
for example, for the calculation process when u is 0x7b, the matrix form corresponding to u is (11011110)TFirst, the following calculation is performed to obtain v:
matrix (01100111)TCorresponding values are v ═ 0xe6, GF (2)8) Where v is the inverse of 0xe6 as v-10xfe, corresponding to the matrix form (01111111)TThen Sbox (u) is A.v-1+C:
Matrix (11100111)TThe corresponding value is Sbox (u) ═ 0xe7, and the calculation results were found to be correct by comparing the Sbox definition in the SM4 standard; this is GF (2)4) Substitution of GF (2)8) The basis of mathematics.
The mathematical structure of Sbox defining SM4 is: GF (2)8),f(x)=1+x2+x4+x5+x6+ x7+x8;
The finite field is defined as: GF (2)4),g(x)=1+x+x4;
The mathematical structure defining Sbox is isomorphic with a quadratic algebraic expansion of finite fields: x and y are calculation data objects;
defining an isomorphic mapping matrix M and an inverse isomorphic mapping matrix M-1The values of (A) are:
thus, let v ∈ GF (2)8) Written as v ═ a0+a1y, wherein, a0,a1∈GF(24) Due to v-1∈ GF(28) V is to be-1Is shown as v-1=b0+b1y,b0,b1∈GF(24) Then according to (a)0+a1y)(b0+ b1y) 1 has:
i.e. GF (2)8) The inner inversion operation can be at GF (2)4)2The calculation process is as follows:
Wherein, w-1Inv (w) denotes GF (2)4)2The inversion operation in (1). Wherein w-1Inv (w) denotes GF (2)4)2The inversion operation in (b) is based on0,b1The calculation formula (c) can be performed according to the process shown in fig. 3, i.e. GF (2)8) The inner inversion operation is converted into GF (2)4) Addition ofMultiplication, sum of squares inversion operation.
In one embodiment, for GF (2)4)2GF (2) in the inversion operation in (1)4) The multiplication process is as follows: with GF (2)4)*Denotes GF (2)4) G 0x02 e GF (2)4)*Is GF (2)4)*Is the generator of, i.e. all e e.g. GF (2)4)*I.ltoreq.15, such that e ═ giFor calculation of c ═ a · b, a, b, c ∈ GF (2)4) First, calculate loggc=logga+loggb, then calculatingWherein logga,loggb can be done by looking up a LOG lookup table containing 16 elements.
The above is the key optimization measure in the method of the present invention, which is to transfer the inversion operation on the Sbox to GF (2) isomorphic therewith by using the algebraic structure of the SM4 block cipher algorithm Sbox4)2Is completed based on the proposed GF (2)4) The optimized multiplication process can accelerate the replacement process of the SM4 block cipher algorithm Sbox, because the SM4 block cipher algorithm key arrangement algorithm and the encryption and decryption process share the same Sbox replacement process, the calculation process can also apply the SM4 block cipher algorithm key arrangement algorithm, and the calculation speed is improved, which is the important invention point of the invention.
Step S3: performing reverse arrangement calculation on the result of the inversion operation to obtain ciphertext messages corresponding to the grouped messages; in the invention, the reverse order transformation R calculation in the SM4 is integrated into the message reverse arrangement calculation process, and the integration of the reverse order transformation R into the message reverse arrangement process does not increase extra calculation, thereby being beneficial to improving the software execution speed.
Taking a 128-bit SIMD instruction implementation as an example, parallel processing of 4 groups of messages is an example to show the message de-arrangement process, see FIG. 4, whereAccording to the message layout process, after executing the round function 32 times, the contents stored in the 4 registers are:
(A32,B32,C32,D32),(A33,B33,C33,D33),(A34,B34,C34,D34),(A35,B35,C35,D35);
after the message reverse arrangement operation of the reverse order transformation R is fused, the stored contents in the 4 registers are respectively:
(A35,A34,A33,A32),(B35,B34,B33,B32),(C35,C34,C33,C32),(D35,D34,D33,D32);
i.e., ciphertext corresponding to 4 sets of plaintext messages, note that with the common SIMD instruction set, fusing the reverse order transforms R does not add additional operational instructions.
GF(24) The addition operation can be completed by bit exclusive OR, the square operation and the inversion operation can be completed by constructing an operation table and utilizing a shuffle instruction which is common in the SIMD instruction set, and in order to realize efficient calculation, a quick calculation GF (2) is needed4) The invention proposes to use GF (2)4) The generation element and the log table and the exponent table can complete the multiplication quickly, and the SIMD instruction set can complete GF by only 7 instructions (the SIMD instruction set based on 128-bit registers can complete GF by 7 instructions (2)4) The last 16 multiplications, a SIMD instruction set based on 256-bit registers may complete GF with 7 instructions (2)4) For the last 32 multiplications, the 512-bit register based SIMD instruction set may complete GF with 7 instructions (2)4) The last 64 multiplications).
Therefore, the calculation process of the linear transformation L in the round function is adjusted by using the characteristics of the linear transformation L and the characteristics of the SIMD instruction set, and is as follows:
the linear transformation L in the round function is defined as:
the linear transformation L is equivalently transformed into:
the calculation only needs 4 bits of XOR, 3 shaffles, 1 left shift, 1 right shift, 1 bit or 10 SIMD instructions in total.
The method of the invention utilizes the characteristics of the linear transformation L and the characteristics of the SIMD instruction set to adjust the operation process of the linear transformation L, so that the linear transformation L can be completed by using fewer SIMD instructions, and the operation speed is further improved.
The invention proposes a parallel computing apparatus of SM4 based on SIMD instructions, comprising a processor and a memory storing a computer program which, when executed by the processor, implements the method of any one of the above.
The invention also proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the above.
For convenience of description, the above devices are described as being functionally separated into various units and described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially implemented or portions thereof that contribute to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device, which may be a personal computer, a server, or a network device, etc., to execute the apparatus of the embodiments or some portions of the embodiments of the present application.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.
Claims (5)
1. A parallel computing method of SM4 based on SIMD instructions is characterized in that it includes the following steps:
step S1: arranging a plurality of input grouped messages to obtain arranged grouped messages;are 32 bits each, and the 4 groups of 128-bit messages are: (A)0,A1,A2,A3),(B0,B1,B2,B3),(C0,C1,C2,C3),(D1,D2,D3,D4);
After the message layout, there are 4 128-bit registers for storing the message layout, where the contents stored in the 4 registers are:
(A0,B0,C0,D0),(A1,B1,C1,D1),(A2,B2,C2,D2),(A3,B3,C3,D3);
step S2: using GF (2) for the choreographed packet message based on SIMD instructions4) Substitution of GF (2)8) Performing Sbox substitution calculation of SM4 to realize inversion operation of wheel function, and obtaining solutionThe result of the inverse operation;
the SIMD instruction based usage GF (2)4) Substitution of GF (2)8) The Sbox substitution calculation procedure to complete SM4 is as follows:
the Sbox substitution calculation process substitutes an 8-bit input with an 8-bit output Sbox (u) according to a substitution table of 256 bytes:
Sbox(u)=A(A·u+C)-1+C
where a is a matrix of 8x8, C is a matrix of 8x1, u represents an 8bit number, and each element in matrices a and C is an element in GF (2):
the mathematical structure of Sbox defining SM4 is: GF (2)8),f(x)=1+x2+x4+x5+x6+x7+x8;
The finite field is defined as: GF (2)4),g(x)=1+x+x4;
The mathematical structure defining Sbox is isomorphic with a quadratic algebraic expansion of finite fields:
defining an isomorphic mapping matrix M and an inverse isomorphic mapping matrix M-1The values of (A) are:
thus, let v ∈ GF (2)8) Written as v ═ a0+a1y, wherein a0,a1∈GF(24) Due to v-1∈GF(28) V is to be-1Is shown as v-1=b0+b1y,b0,b1∈GF(24) Then according to (a)0+a1y)(b0+b1y) 1 has:
i.e. GF (2)8) The inner inversion operation can be at GF (2)4)2The calculation process is as follows:
Wherein u is GF (2)8) U is affine transformed to obtain GF (2)8) V in (1), and isomorphic mapping of v to GF (2)4)2Then, the complex domain inversion operation is carried out on w to obtain GF (2)4)2Element w of (5)-1Then to w-1Isomorphic mapping to GF (2)8) S, and finally affine transformation is carried out on s to obtain GF (2)8) Element (ii) t, w-1Inv (w) denotes GF (2)4)2The inversion operation in (1);
for GF (2)4)2GF (2) in the inversion operation in (1)4) The multiplication process is as follows: with GF (2)4)*Denotes GF (2)4) G 0x02 e GF (2)4)*Is GF (2)4)*Is the generator of, i.e. all e e.g. GF (2)4)*I.ltoreq.15, such that e ═ giFor calculation of c ═ a · b, a, b, c ∈ GF (2)4) First, calculate logg c=logg a+loggb, then calculatingWherein logg a,loggb can be done by looking up a LOG lookup table containing 16 elements;
GF(24) The addition operation is completed through bit exclusive or, and the square operation and the inversion operation can be completed through constructing an operation table and utilizing a common shuffle instruction in the SIMD instruction set; using GF (2)4) The process of multiplication calculation is rapidly completed by the generator, the logarithm table and the exponent table, and the SIMD instruction set can be used for completing the process only by 7 instructions;
step S3: and performing reverse arrangement calculation on the result of the inversion operation to obtain the ciphertext message corresponding to the grouped message.
2. A method of parallel computation of a SIMD instruction based SM4 according to claim 1, wherein the computation of the linear transformations L in the round functions is adjusted by using the characteristics of the linear transformations L and the characteristics of the SIMD instruction set by:
the linear transformation L in the round function is defined as:
the linear transformation L is equivalently transformed into:
the calculation only needs 10 SIMD instructions including XOR for 4 times, shuffle for 3 times, left shift for 1 time, right shift for 1 time, and OR for 1 time.
3. A method of parallel computation of a SIMD instruction based SM4 according to any of claims 1-2, wherein the inverse transform R computation in SM4 is incorporated into the message de-marshalling computation process.
4. A parallel computing apparatus based on the parallel computing method of the SM4 of SIMD instructions, characterized in that it comprises a processor and a memory, said memory storing a computer program which, when executed by the processor, implements the method of any of claims 1-3.
5. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the parallel computing method of SIMD instruction based SM4 of any of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010687106.9A CN111736902B (en) | 2020-07-16 | 2020-07-16 | Parallel computing method and device of SM4 based on SIMD (Single instruction multiple data) instructions and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010687106.9A CN111736902B (en) | 2020-07-16 | 2020-07-16 | Parallel computing method and device of SM4 based on SIMD (Single instruction multiple data) instructions and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111736902A CN111736902A (en) | 2020-10-02 |
CN111736902B true CN111736902B (en) | 2022-04-19 |
Family
ID=72654782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010687106.9A Active CN111736902B (en) | 2020-07-16 | 2020-07-16 | Parallel computing method and device of SM4 based on SIMD (Single instruction multiple data) instructions and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111736902B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507644B (en) * | 2020-12-03 | 2021-05-14 | 湖北大学 | Optimized SM4 algorithm linear layer circuit |
CN113282947A (en) * | 2021-07-21 | 2021-08-20 | 杭州安恒信息技术股份有限公司 | Data encryption method and device based on SM4 algorithm and computer platform |
CN114244496B (en) * | 2021-12-01 | 2023-07-18 | 华南师范大学 | SM4 encryption and decryption algorithm parallelization realization method based on tower domain optimization S box |
CN114091086A (en) * | 2022-01-14 | 2022-02-25 | 麒麟软件有限公司 | Rapid realization method of SM4 algorithm based on bit slice |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106712930A (en) * | 2017-01-24 | 2017-05-24 | 北京炼石网络技术有限公司 | SM4 encryption method and device |
CN108092760A (en) * | 2016-11-22 | 2018-05-29 | 北京同方微电子有限公司 | A kind of co-processor device of block cipher and non-linear transformation method |
CN109417468A (en) * | 2017-04-12 | 2019-03-01 | 北京炼石网络技术有限公司 | The method and apparatus that safe and efficient block cipher is realized |
CN110166223A (en) * | 2019-05-22 | 2019-08-23 | 北京航空航天大学 | A kind of Fast Software implementation method of the close SM4 of state |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109600217A (en) * | 2019-01-18 | 2019-04-09 | 江苏实达迪美数据处理有限公司 | Optimize the method and processor of SM4 encryption and decryption in parallel operational mode |
CN109981250B (en) * | 2019-03-01 | 2020-04-07 | 北京海泰方圆科技股份有限公司 | SM4 encryption and key expansion method, device, equipment and medium |
-
2020
- 2020-07-16 CN CN202010687106.9A patent/CN111736902B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108092760A (en) * | 2016-11-22 | 2018-05-29 | 北京同方微电子有限公司 | A kind of co-processor device of block cipher and non-linear transformation method |
CN106712930A (en) * | 2017-01-24 | 2017-05-24 | 北京炼石网络技术有限公司 | SM4 encryption method and device |
CN109417468A (en) * | 2017-04-12 | 2019-03-01 | 北京炼石网络技术有限公司 | The method and apparatus that safe and efficient block cipher is realized |
CN110166223A (en) * | 2019-05-22 | 2019-08-23 | 北京航空航天大学 | A kind of Fast Software implementation method of the close SM4 of state |
Non-Patent Citations (1)
Title |
---|
SM4的快速软件实现技术;郎欢 等;《中国科学院大学学报》;20180331;第35卷(第2期);180-187 * |
Also Published As
Publication number | Publication date |
---|---|
CN111736902A (en) | 2020-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111736902B (en) | Parallel computing method and device of SM4 based on SIMD (Single instruction multiple data) instructions and readable storage medium | |
CN110166223B (en) | Rapid implementation method of cryptographic block cipher algorithm SM4 | |
CN113940028B (en) | Method and device for realizing white box password | |
JP5711681B2 (en) | Cryptographic processing device | |
US20050232430A1 (en) | Security countermeasures for power analysis attacks | |
CN107257279B (en) | Plaintext data encryption method and device | |
WO2020253108A1 (en) | Information hiding method, apparatus, device, and storage medium | |
CN110880967B (en) | Method for parallel encryption and decryption of multiple messages by adopting packet symmetric key algorithm | |
JP5612007B2 (en) | Encryption key generator | |
Abdellatif et al. | AES-GCM and AEGIS: efficient and high speed hardware implementations | |
JP5689826B2 (en) | Secret calculation system, encryption apparatus, secret calculation apparatus and method, program | |
CN111314054B (en) | Lightweight ECEG block cipher realization method, system and storage medium | |
CN109936437B (en) | power consumption attack resisting method based on d +1 order mask | |
CN110266481B (en) | Post-quantum encryption and decryption method and device based on matrix | |
CN114244496B (en) | SM4 encryption and decryption algorithm parallelization realization method based on tower domain optimization S box | |
Lim et al. | Differential fault attack on lightweight block cipher PIPO | |
CN113691364B (en) | Encryption and decryption method of dynamic S-box block cipher based on bit slice technology | |
Kabulov et al. | Gost R 34.12-2015 (Kuznechik) analysis of a cryptographic algorithm | |
CN105577362B (en) | A kind of byte replacement method and system applied to aes algorithm | |
CN110224829B (en) | Matrix-based post-quantum encryption method and device | |
CN112287333A (en) | Lightweight adjustable block cipher implementation method, system, electronic device and readable storage medium | |
CN113922948B (en) | SM4 data encryption method and system based on composite domain round function | |
KR102253211B1 (en) | Computing Apparatus and Method for Hardware Implementation of Public-Key Cryptosystem Supporting Elliptic Curves over Prime Field and Binary Field | |
CN116915405B (en) | Data processing method, device, equipment and storage medium based on privacy protection | |
JP2013205437A (en) | Method and apparatus for calculating nonlinear function s-box |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |