CN114374507A - Method and system compatible with AES and SM4 block cipher algorithms - Google Patents

Method and system compatible with AES and SM4 block cipher algorithms Download PDF

Info

Publication number
CN114374507A
CN114374507A CN202210028032.7A CN202210028032A CN114374507A CN 114374507 A CN114374507 A CN 114374507A CN 202210028032 A CN202210028032 A CN 202210028032A CN 114374507 A CN114374507 A CN 114374507A
Authority
CN
China
Prior art keywords
transformation
aes
round
mapping
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210028032.7A
Other languages
Chinese (zh)
Inventor
邓峰
王良清
王若璨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guowei Group Shenzhen Co ltd
Original Assignee
Guowei Group Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guowei Group Shenzhen Co ltd filed Critical Guowei Group Shenzhen Co ltd
Priority to CN202210028032.7A priority Critical patent/CN114374507A/en
Publication of CN114374507A publication Critical patent/CN114374507A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • H04L9/0631Substitution permutation network [SPN], i.e. cipher composed of a number of stages or rounds each involving linear and nonlinear transformations, e.g. AES algorithms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/14Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using a plurality of keys or algorithms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a method and a system compatible with two block cipher algorithms of AES and SM 4. The method compatible with two block cipher algorithms of AES and SM4 comprises the following steps: uniformly converting the unified round operation of SM4 and the operation process of a single general iteration round from 1 st round to 8 th round in the unified round operation of AES into a global pipeline operation process of front linear transformation, S-box replacement transformation and rear linear transformation; the front linear transformation comprises at least one stage of time sequence isolation section, and any stage of the time sequence isolation section is composed of at least one exclusive-OR gate or is a linear transformation time sequence isolation section for realizing the displacement processing. The invention can realize the balance of high performance and low resource consumption of two block cipher algorithms compatible with AES and SM 4.

Description

Method and system compatible with AES and SM4 block cipher algorithms
Technical Field
The invention relates to the technical field of block encryption algorithms, in particular to a method and a system compatible with two block cipher algorithms of AES and SM 4.
Background
The SM4 symmetric cryptographic algorithm established by the organization of the technical committee on information security standardization of China, as a commercial block cipher algorithm, was released as the cipher industry standard in 2012, was converted into the Chinese national standard in 2016, and became the ISO/IEC international standard in 2021 in 6 months, which marked the continuous improvement of the scientific and technological level and international standardization ability of the commercial cipher in China.
The AES symmetric cipher algorithm, which was formally released 11 months in 2001 and was established by the National Institute of Standards and Technology (NIST) organization, has long been the actual standard for many applications of symmetric-key-based encryption ciphers on an international scale.
With the advance of SM4 standardization in recent years and its widespread adoption in many application platforms, there has been a frequent need for applications that support both of these sophisticated algorithms compatibly on the same platform. Especially, when the high-speed interface transmission protection system standard of the ultra high definition television in China is manufactured, the application requirement of compatibly completing the encryption/decryption of the program code stream of the SM4 or AES standard when the image frame data with the definition of 8K is provided is met.
However, the existing methods and systems compatible with the SM4 and AES algorithms have poor performance, cannot meet the corresponding throughput rate requirements, and simultaneously save the resource overhead.
Disclosure of Invention
In order to solve the technical problem that the performance of a method and a system which can be compatible with two algorithms of SM4 and AES in the prior art is poor, the invention provides a method and a system which are compatible with two block cipher algorithms of AES and SM 4.
The method for compatible two block cipher algorithms of AES and SM4 provided by the invention comprises the following steps: uniformly converting the unified round operation of SM4 and the operation process of a single general iteration round from 1 st round to 8 th round in the unified round operation of AES into a global pipeline operation process of front linear transformation, S-box replacement transformation and rear linear transformation;
the front linear transformation comprises at least one stage of time sequence isolation section, and any stage of the time sequence isolation section is composed of at least one exclusive-OR gate or is a linear transformation time sequence isolation section for realizing the displacement processing.
Further, still include: the S-box replacement transformation of AES and SM4 is uniformly converted into a partial pipeline type operation process of front mapping transformation, composite domain inversion transformation and rear mapping transformation, so that the S-box replacement transformation comprises a multi-stage time sequence isolation section, any stage of the time sequence isolation section is composed of at least one exclusive-OR gate and/or at least one AND gate, in one embodiment, the exclusive-OR gate can be a two-input exclusive-OR gate, and the AND gate can be a two-input AND gate;
the pre-mapping transformation transforms elements on GF (2^8) domain different from SM4 in AES in the S-box substitution transformation into the same target composite domain linearly isomorphic to the respective original GF (2^8) domain.
Further, the preset criteria of processing performance and storage resource overhead are achieved by designing the sum of the number of timing isolation segment stages of the pre-linear transformation and the number of timing isolation segment stages of the S-box substitution transformation required by SM4 in a single general iteration round.
Further, the operation processes of a single iteration round in the 9 th round to the last round in the unified round operation of the AES are also uniformly converted into the global pipeline operation processes of the front linear transformation, the S-box replacement transformation, and the rear linear transformation.
Further, when the processing algorithm is AES, the first iteration round only has round key adding sub-transformation, the front linear transformation of other iteration rounds comprises line shift operation of the AES, the rear linear transformation comprises exclusive or network transformation formed by combining column confusion of the AES and round key adding operation, and the rear linear transformation of the last iteration round does not contain the column confusion of the AES;
when the algorithm of the processing is SM4, the front linear transformation comprises the key adding operation of the SM4 round, the rear linear transformation comprises the exclusive-or network transformation formed by combining the L transformation and the XOR operation of the SM4 round, and FP replacement is performed after the last round of rear linear transformation is finished.
Further, when the processing algorithm is an encryption algorithm of AES, the post-mapping transformation carries out affine transformation of the AES standard after carrying out GF (2^8) isomorphic inverse mapping corresponding to the AES; when the processed algorithm is a decryption algorithm of AES, after the inverse affine transformation of the AES standard is carried out by the pre-mapping transformation, GF (2^8) isomorphic mapping corresponding to the AES is carried out;
when the processed algorithm is SM4, the pre-mapping transformation performs the first affine transformation of the SM4 standard before GF (2^8) is transformed into the target composite domain, and simultaneously, the post-mapping transformation of SM4 performs the second affine transformation of the SM4 standard after the target composite domain is subjected to GF (2^8) isomorphic inverse mapping.
Further, the target composite domain is a GF ((2^4) ^2) domain.
Furthermore, the division principle of the time sequence isolation section is that the combined path delay of each isolation section does not exceed the sum of the circuit delays of N-level basic gate units, and the value of N is set according to the preset standards of processing performance and storage resource overhead.
The system for realizing the method compatible with the AES and SM4 block cipher algorithms in the technical scheme comprises a plurality of iteration round realization circuits, wherein the plurality of iteration round realization circuits comprise general iteration round realization circuits, and a single general iteration round realization circuit comprises:
a front linear transformation circuit for implementing a line shift operation of AES and a round key addition operation of SM 4;
the S-box replacing transformation circuit is used for realizing front linear mapping, composite domain inverse transformation and rear mapping transformation;
and the post-linear transformation circuit is used for realizing exclusive-or network transformation (XNWA) formed by combining column confusion and round key adding operation of AES and exclusive-or network transformation (XNWS) formed by combining L transformation and XOR combination operation of SM 4.
Further, the front linear conversion circuit comprises at least one exclusive or gate and a multiplexer.
Based on the technical scheme, the invention realizes the high-performance implementation of the unified round operation compatible with and supporting the AES and SM4 block cipher algorithms, and is realized based on the design method of a global and local pipeline mechanism.
The high-performance realization method is used for compatibly supporting the unified round operation of two block cipher algorithms of AES and SM4, and is realized by adopting a design method combining a composite domain conversion and a DACSE algorithm on an S-box substitution transformation logic.
The low-resource-overhead implementation of the unified round of operation for compatibly supporting the AES and SM4 block cipher algorithms is realized based on the design idea of multiplexing S-box substitution transformation logic to the maximum extent and multiplexing other linear transformation logic to the sub-optimal extent.
The search and replacement operation of the unified version S-box search and replacement transformation logic compatible with and supporting AES and SM4 block cipher algorithms is realized by adopting a circuit structure which sequentially comprises front mapping transformation, composite domain inverse transformation and rear mapping transformation.
The unified round-robin operation processing logic (RND) for compatible supporting AES and SM4 block cipher algorithms is realized by adopting a circuit structure of 'front linear transformation, S-box substitution transformation and rear linear transformation' in sequence in a single general iteration round operation.
The independent round key expansion logic (KEP) for supporting two block cipher algorithms of AES and SM4 respectively has the advantages that the whole expansion operation can be realized based on a local hardware circuit and can also be realized based on an external hardware circuit. Each round key obtained by the expansion operation is stored by using a nonvolatile storage element (such as a D flip-flop, for example), so that a large amount of round key data required by the round operation in the pipeline stage can be obtained simultaneously and quickly.
And the switching selection control logic is used for supporting hardware autonomous selection of a plurality of sets of round keys and is realized based on the circuit structure of the multiplexer.
For the balance consideration of two implementation indexes of high processing performance and low resource overhead, the integration level of the storage element (which is proportional to the hardware resource overhead of the partial sub-circuit) required by the SM4 algorithm for performing multi-level relocation of the state variable in the round-robin pipeline stage of the algorithm and the timing isolation level of the related linear transformation logic (which is proportional to the highest working frequency of the whole hardware module) in the unified round-robin operation are balanced and implemented according to the specific application requirements expected by the whole cryptosystem.
The global-level pipeline mechanism is defined as: the whole processing of round operation logic of the block cipher algorithm is realized by adopting a pipeline type design framework, namely, the round operation processing of each iteration round is realized by adopting different physically independent circuit logics.
The above-mentioned local hierarchy pipeline mechanism is defined as: the single iteration round processing of the round operation logic for realizing the block cipher algorithm is realized by adopting a pipeline type design architecture, namely the round operation sub-transformation processing of the single round is divided into a plurality of sections from the time sequence by the related balance consideration, and each section is separately realized by adopting a group of circuit logics which are required to be subjected to time sequence isolation.
The S-box substitution transformation logic in the unified round of operation compatible with and supporting two block cipher algorithms is realized by adopting a design method combining composite domain conversion and DACSE algorithm and adopting a circuit structure of 'front mapping transformation, composite domain inversion transformation and rear mapping transformation' in sequence.
Wherein the pre-map transformation and/or the post-map transformation comprises part of the processing of the transformation in the composite domain transformation, and the inverse transformation of the composite domain corresponds to part of the processing of the substitution on the composite domain. For the pre-mapping transformation and the post-mapping transformation which are linear in nature, an optimization algorithm, which is abbreviated as DACSE, may be further adopted, and the minimum number or the minimum delay of the basic gate unit circuit required by the pre (post) mapping transformation may be analyzed and inferred based on the specific physical implementation environment of the target algorithm circuit (including but not limited to the underlying platform (ASIC or FPGA), the process & technology of the foundry, and the EDA tool of the back-end implementation), so as to make trade-off for the two implementation indexes of high processing performance and low resource overhead.
The above-mentioned SBOX logic (i.e., S-box substitution transform logic) is always present in the round operations of the two algorithms (note: except for the #0 round of AES), and as the only non-linear transform logic in the block cipher algorithm, it is itself most critical in both timing performance and resource overhead, so the multiplexing implementation for SBOX logic should be maximally considered. Other linear transformation logic of the round of operations of the algorithm, however, should not be considered to a high degree because of the natural differences defined by the AES and SM4 algorithms themselves, which may be irrevocable in terms of resource overhead for their multiplexed implementation.
The rounds #1 to #8 of AES round operation are realized by multiplexing corresponding conversion logic with all rounds of SM4 round operation (namely rounds #0 to # 31).
The #9 to # N round (N: 10/12/14for AES-128/192/256) of AES round operation does not need to be realized by multiplexing with SM4 round operation, i.e. its corresponding transform logic (especially S-box substitution transform logic) can be an independent version only supporting AES algorithm, so that the corresponding hardware resource overhead can be saved.
Such target complex domains include, but are not limited to GF ((2^4) ^ 2).
The pre-linear transformation of the unified round of operation includes: the line shift operation of AES, and the round key addition operation of SM 4.
The S-box replacement of the above unified round of operation is transformed into: byte substitution operations that are expanded at byte (8-bits) granularity for all or part of the state variable.
The post-linear transformation of the unified round of operation includes: the exclusive-or network transformation operation combining the column obfuscation and round key operations of AES, and the exclusive-or network transformation operation combining the L transformation and XOR transformation operations of SM 4.
All transformation operations of linear property can be realized by using basic gate unit circuits such as 'two-input exclusive-or gate, two-input and gate' so as to obtain the trade-off measure of the hardware implementation under two implementation indexes of high processing performance and low resource overhead.
The S-box substitution transform operation of the unified round of operation described above will employ a multiplexed implementation version of the various SBOX logic compatible with supporting AES and SM 4. Note that: the S-box substitution transform operation dedicated to the next few rounds of AES operations may be another multiplexing implementation version of SBOX logic that does not contain SM4 (i.e., it is simply a multiplexing of SBOX logic in the forward and reverse versions of AES).
The post linear transformation of the unified round of operation described above will implement two separate sets of different xor network transformation circuits for the two algorithms AES and SM4, the reason for which has been explained in the above description.
The decryption direction processing of the AES algorithm round operation is consistent with the flow of the encryption direction processing by using a sub-transform sequence reconstruction method well known in the art, so that the two processing directions do not have substantial difference in terms of hardware implementation flow control, and therefore, the description thereof is not further distinguished herein.
In an application scenario where high performance processing is the preferred target (note: this scenario considers the update frequency of the initial key to be low), the KEP logic responsible for providing the round-robin as 'round-key' data would be implemented separately from the round-robin logic.
Generally, the KEP logic may be implemented integrally with the round robin logic by the present hardware circuit blocks.
Optionally, the KEP logic is implemented integrally by other circuit modules outside the scope of the current hardware circuit module.
Regardless of the above option, the round keys obtained by the expansion operation are stored by using a nonvolatile storage element (such as a D flip-flop, for example), so that a large amount of round key data required by the round operation in the pipeline stage can be obtained simultaneously and quickly.
Still alternatively, under the "general" implementation described above, the SBOX logic required for round key expansion processing may reuse the corresponding logic circuits in the round of operations to save some hardware resource overhead.
Under the premise of the same block cipher algorithm, in order to support the cipher processing for an application scenario such as "ciphertext data encrypted based on different initial keys and having unpredictable arrival sequence can be received within a continuous period of time", a storage circuit capable of storing a proper number of sets of round keys is implemented in a current hardware circuit module, a corresponding initial Key Identifier (KID) is predefined, and then a certain correct set of round keys is automatically selected by hardware in real time and used based on the KID identifier and the circuit structure of a multiplexer in the global hierarchical pipeline of the round operation.
Based on the definition characteristics of the SM4 algorithm and the pipeline design mechanism, the invention needs to carry out multi-stage pipeline relocation on the corresponding state logic variable of the SM4 round operation in the pipeline stage.
The integration stage number of the required nonvolatile storage element (such as a D trigger) depends on the sum of the stage numbers of the timing isolation sections of the summation of the two parts of conversion logic of the front linear conversion and the S-box replacing conversion of the SM4 round operation. The more the number of the stages of the time sequence isolation section is, the simpler the calculation task to be completed in each working clock period is, the higher the clock frequency of the whole hardware circuit which can stably run can be increased, and the higher the processing performance of the whole hardware circuit can be increased; however, the cost of this implementation is that the memory resource overhead required for pipeline migration becomes large due to the large number of stages of timing isolation sections.
Thus, if it is not necessary for the entire hardware circuit block to run at its highest clock frequency in practical applications, the designer can significantly reduce the memory resource overhead required for the pipeline relocation by reducing the number of sequential isolation stages of the two-part conversion logic. Moreover, when the operating clock frequency of the whole hardware circuit module is limited to a lower level, the related EDA tool can select the basic gate unit circuit with larger delay in physical implementation, which also brings a certain degree of benefit to the saving of hardware resource overhead.
Drawings
The invention is described in detail below with reference to examples and figures, in which:
fig. 1 is a design architecture of a single iteration round compatible with a unified round of operations supporting two algorithms of AES and SM4 in accordance with the present invention.
FIG. 2 is an architectural design of the present invention for the local pipeline of various SBOX logic for both AES and SM4 algorithms.
Fig. 3 is a circuit implementation of the whole block operation compatible with the algorithms supporting AES and SM4 in the global pipeline mechanism.
Fig. 4 is a circuit implementation architecture of a basic pipeline logic of unified round robin in a global pipeline design architecture according to an embodiment of the present invention.
FIG. 5 is a circuit implementation of the unified round robin SBOX logic under a local pipeline design architecture according to an embodiment of the invention.
FIG. 6 is a circuit implementation of non-multiplicative inversion logic in multiplexed SBOX logic under a local pipeline design architecture, in accordance with an embodiment of the invention.
FIG. 7 is a circuit implementation of multiply-invert logic on the target complex domain of multiplexed SBOX logic under a local pipeline design architecture, in accordance with an embodiment of the present invention.
FIG. 8 is a block diagram illustrating an overall circuit implementation of the unified round robin in the global pipeline mechanism according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Thus, a feature indicated in this specification will serve to explain one of the features of one embodiment of the invention, and does not imply that every embodiment of the invention must have the stated feature. Further, it should be noted that this specification describes many features. Although some features may be combined to show a possible system design, these features may also be used in other combinations not explicitly described. Thus, the combinations illustrated are not intended to be limiting unless otherwise specified.
For a general symmetric key Block cipher algorithm, whether encryption or decryption processing, it is essentially iterative processing for "data variables whose bit width is one Block (packet) in size". This iteratively processed data block variable is commonly referred to as a state (also called a "state matrix"). For hardware circuit implementation of AES and SM4, which are both 16-byte wide algorithms, an implementation architecture without priority on performance index is generally: only one set of sequential logic elements (such as D flip-flops) is physically integrated to register the post-operation state result of each iteration round of the round operation, and each sub-transformation of each iteration round is realized by a corresponding combinational logic element. Obviously, the implementation architecture with resource overhead indicators as the priority does not satisfy the ultra-high throughput rate implementation indicators, and therefore, the implementation architecture based on the pipeline mechanism is generally adopted to consider the hardware circuit implementation of the algorithm.
The method is realized by designing a framework through a pipeline mechanism, and the method is compatible with two block cipher algorithms of AES and SM4, and uniformly converts a single general iteration round operation process in 1 st round to 8 th round in unified round operation of SM4 and AES into a global pipeline type operation process of front linear transformation, S-box replacement transformation and rear linear transformation. The front linear transformation comprises at least one stage of time sequence isolation section, and any stage of time sequence isolation section is composed of at least one exclusive-or gate or is a linear transformation time sequence isolation section for realizing the displacement processing.
The AES algorithm has N +1 iteration rounds, N is 10, N is 12, or N is 14, and all iteration rounds except the first iteration round which may be denoted as "# 0 round" may be referred to as a general iteration round. The AES algorithm comprises 4 seed transformations in other general iteration rounds except the last general iteration round, and the last general iteration round only comprises 3 seed transformations and belongs to a special general iteration round. The SM4 algorithm has 32 iteration rounds, all of which are general iteration rounds. The single general iteration round and the other single iteration rounds are each a single iteration round.
In a preferred embodiment, the single iterative round operation process of the unified round operation of AES and SM4 can be uniformly converted into the global pipelined operation process of the front linear transformation, the S-box replacement transformation and the rear linear transformation, that is, the single iterative round operation process of the 9 th round to the last round in the unified round operation of AES can also be uniformly converted into the global pipelined operation process of the front linear transformation, the S-box replacement transformation and the rear linear transformation.
When the processing algorithm is AES, the first iteration round only has round-added key sub-transformation, and the previous linear transformation of other iteration rounds includes the line shift operation of AES. The post-linear transformation comprises an exclusive-or network transformation which is formed by combining column confusion of AES and round key adding operation, and the post-linear transformation of the last iteration round does not contain the column confusion of the AES.
When the algorithm of the processing is SM4, the front linear transformation comprises the key adding operation of the SM4, the rear linear transformation comprises the exclusive-or network transformation formed by combining the L transformation and the XOR operation of the SM4, and FP (Final simulation) replacement is performed after the last round of rear linear transformation is finished.
The present invention abstractly presents the design architecture of a single iteration wheel compatible with a unified wheel operation that supports both algorithms as the architectural implementation shown in fig. 1. The implementation framework is centered on the 'S-box replacement transformation' in a single iteration round, the pre-existing sub-transformation operation is defined as 'pre-linear transformation (MLpre)', and the post-existing sub-transformation operation is defined as 'post-linear transformation (MLaft)'. With the abstract division, the architectural design and implementation of the pipeline mechanism of the local level can be further developed in a manner of dividing the three parts in a stepwise manner. Namely, the circuit part for realizing the front linear transformation of the single iteration wheel of the unified wheel operation comprises: the line shift operation of AES, and the round key addition operation of SM 4. The circuit for realizing the S-box replacement transformation of the single iteration wheel of the unified wheel operation is a byte replacement operation which is developed by taking bytes (8-bit) as granularity aiming at the whole or part of a state variable, and the circuit for realizing the post linear transformation of the single iteration wheel of the unified wheel operation comprises the following steps: the column obfuscation and round-key-addition operations of AES are combined into an exclusive-or network transform (XNWA), and the L-transform and XOR-combination operations of SM4 are combined into an exclusive-or network transform (XNWS).
On the basis, the basic idea of multiplication inversion on the same target composite domain is adopted by the invention, and the S-box replacement transformation of AES and SM4 is uniformly converted into a local pipeline type operation process of front mapping transformation, composite domain inversion transformation and rear mapping transformation, so that the S-box replacement transformation comprises a multi-stage time sequence isolation section, and any one stage of time sequence isolation section is composed of at least one exclusive-OR gate and/or at least one AND gate, wherein the exclusive-OR gate can be a two-input exclusive-OR gate, and the AND gate can be a two-input AND gate. Wherein the pre-mapping transformation transforms elements in GF (2^8) domain different from SM4 in AES in the S-box substitution transformation into the same target composite domain linearly isomorphic to the respective original GF (2^8) domain.
Previous solutions for a dedicated hardware accelerator (i.e., the hardware circuit block described herein before) compatible with implementing SM4 and the AES block cipher algorithm were generally: the same physical memory logic (e.g., SRAM) inside the hardware circuit module is used to carry the SBOX logic different from the two algorithms in a time-multiplexed manner, i.e., the memory logic is loaded with the S-box replacement result corresponding to a certain algorithm during the use period when the algorithm is activated.
Under the premise of this general solution, SBOX logic usually adopts a look-up table (LUT) implementation, and the access performance of the corresponding memory logic is limited within the framework of the established process of the integrated circuit, so that it often fails to meet the performance index requirement for higher data throughput rate in some specific application scenarios.
If the LUT + SRAM solution is not adopted and the hardware resource overhead is desired to be saved as much as possible on the premise of meeting the requirement of ultra-high throughput, the difficulties and key points thereof are as follows: efficient reuse implementation of SBOX logic for the two algorithms to differ, and appropriate reuse implementation of the other sub-transform logic for round robin under a pipelined mechanism.
The invention adopts the general idea of composite domain conversion plus multiplication inversion aiming at the efficient reuse realization of different SBOX logics of two algorithms in round operation, so that the nonlinear part of the SBOX logic (namely, the multiplication inversion on a GF (2^8) domain) can find the same reusable part in different definitions of the two algorithms. In addition, for the linear part of the SBOX logic itself (i.e., (inverse) affine transformations) and the isomorphic (inverse) mapping transformations involved in the element interconversion between the GF (2^8) domain and the target complex domain (note: both are linear transformations), they can optionally trade-off the specific circuit implementation based on the DACSE optimization algorithm at the very high throughput/performance implementation index.
Symmetric key cryptographic algorithms such as SM4 and AES involve the concept of finite fields (also known as galois fields, abbreviated as GF) in many of their round of sub-transform operations, as well as arithmetic operations such as addition, multiplication inversion, etc., over the corresponding finite fields. In the context of modern computer applications and CMOS digital logic design, the corresponding data is usually processed/stored/transmitted in binary form due to relevant integration implementation limitations. Therefore, the symmetric key cryptographic algorithms such as AES and SM4 take such characteristics into account in the implementation platform at the beginning of the formulation.
A field is a set of elements on which addition, subtraction, multiplication, and division operations can be performed without the result exceeding the field. A domain containing a finite number of elements is called a finite domain, and the number of elements thereof is called an order of the finite domain. The order of each finite field must be a power of the prime number, i.e., the order of the finite field can be expressed as p ^ n (p is a prime number, n is a positive integer). Finite Fields, also commonly referred to as Galois Fields (Galois Fields), are denoted by GF (p ^ n).
When n is 1 and p is a certain prime number, the finite field gf (p) that exists is also called the prime number field.
When n is 1 AND p is 2, the field corresponds to a field GF (2) of minimum elements whose elements in cryptography are 0 AND 1 (binary numbers of 1-bit), AND the addition AND multiplication on this field correspond to logical exclusive-or (XOR) AND logical AND (AND) operations, respectively.
When n >1, p ^2, corresponds to the GF (2^ n) field, its elements are not generally expressed by integers in cryptography, but are expressed as a polynomial with "the highest term x ^ (n-1), and the coefficient of each term is an element in the GF (2) field".
In cryptography, finite fields are widely used. The prime number fields GF (p) and GF (2^ n) are most commonly used.
For the symmetric key cryptographic algorithms of SM4 and AES, the S-box substitution sub-transform of its round operation specifically involves GF (2^8) domain, and an 8-bit byte data can be mapped into one of 256 possible 7-degree polynomials. Based on this, the addition and subtraction operations on GF (2^8) field in cryptography are defined as: the two operands correspond to an XOR operation of the respective coefficients of the polynomial in order, which is equivalent to the operation of "bitwise XOR on the two bytes to be added/subtracted". Similarly, the defined basis for multiply and divide operations over GF (2^8) domain in cryptography includes: an irreducible polynomial of order 8 is specified and based on it a modulo operation is performed.
Thus for multiplication, it is defined as: two operands are first multiplied by a polynomial, then the irreducible polynomial is modulo, the remainder of the modulo is the multiplication result, which must be an element in the GF (2^8) domain. For multiplication inversion, it is equivalent to "division operation where the dividend is defined as element 1 in the GF (2^8) domain". In other words, the inverse of the multiplication of element X is to find element Y such that the remainder of X multiplied by Y modulo an irreducible polynomial is element 1.
The SBOX logic of the AES and SM4 algorithms is clearly different, and in addition to the definition of affine transformation being different, it is more critical because of the dissimilarity of the irreducible polynomials relied upon for the multiplicative inversion over the GF (2^8) domain.
The 8 th-order irreducible polynomial of AES is SA (x ^8+ x ^4+ x ^3+ x +1), and that of SM4 is SS (x ^8+ x ^7+ x ^6+ x ^5+ x ^4+ x ^2+ 1).
To find common for the SBOX logic of both AES and SM4 algorithms to multiply-invert operations over GF (2^8) domains, a de-composition operation is required to convert the two different GF (2^8) domains into the same some target composite domain that is linearly isomorphic to both the respective original GF (2^8) domains. Based on this, the multiplicative inversion operation in the GF (2 < Lambda > 8) domain can be converted into some corresponding isomorphic operation in the target complex domain that is more suitable for digital circuit implementation in binary substrates. In addition, because the GF (2 < Lambda > 8) domain is linearly isomorphic with the target composite domain, this allows the element mapping transformation between these two types of domains to be accomplished by an isomorphic (inverse) mapping transformation that is "implementation as matrix multiplication".
In one embodiment, the target composite field may be a GF ((2^4) ^2) field based on a specified 2 nd order irreducible polynomial (P (y) ^ y 2+ y + v, v ^ 0010}2) Is generated from the field GF (2^4), which is generated from another specified irreducible polynomial of order 4 (Q) (x ^4+ x ^3+ x ^2+ x +1) from the GF (2) field. Thus, the multiply-invert operation on the GF (2^8) domain can be isomorphically mapped as a multiply-invert operation on the target composite domain. Whereas in order to implement an isomorphic (inverse) mapping of the elements in the two GF domains, a corresponding isomorphic (inverse) mapping transformation specific on the basis of a single algorithm will be employed. In hardwareIn terms of circuit implementation, the multiplicative inversion at the target complex domain will be implemented using a combination of circuit structures ordered as "GF (2^8) domain → element isomorphic mapping of GF ((2^4) ^2) domain, multiplicative inversion at the GF ((2^4) ^2) domain, element isomorphic inverse mapping of GF ((2^4) ^2) domain → GF (2^8) domain".
In order to compatibly support the ultra-high performance implementation of the two algorithms, in a preferred embodiment of the present invention, the above-mentioned pipeline mechanism with both global and local layers is designed to implement a method for compatible with the two block cipher algorithms of AES and SM 4.
Fig. 2 shows the architectural design of the local pipeline for the various SBOX logic for both AES and SM4 algorithms. The AES algorithm uses both forward and reverse versions of SBOX logic for both encryption and decryption directions, while the SM4 algorithm uses the same version of SBOX logic for the encryption/decryption directions. Whatever the algorithm, or the forward version/reverse version SBOX of AES, these SBOX logics can be realized by splitting into sub-transform circuit combinations of 'pre-affine transform, multiplicative inverse over GF (2^8) domain, post-affine transform' in sequence (note: this split corresponds to the original definition of the algorithm standard).
Specifically, the three SBOX logics "pre-affine transformations" of the forward SBOX of AES, the reverse SBOX of AES, and the SBOX of SM4 are: the space transform, the inverse affine transform of the AES standard, the first affine transform of the SM4 standard. And the "post affine transformations" of these three SBOX logics are: affine transformation of AES standard, null transformation, second affine transformation of SM4 standard.
Under the general idea of "composite domain conversion + multiplication inversion" described in this specification, the standard split can be adjusted to an optimized split as illustrated in the leftmost column of fig. 2. When the processed algorithm is an encryption algorithm of AES, the post-mapping transformation carries out affine transformation of the AES standard after carrying out GF (2^8) isomorphic inverse mapping on the target composite domain; when the processed algorithm is a decryption algorithm of AES, performing inverse affine transformation of the AES standard before GF (2^8) is converted into a target composite domain by the pre-mapping transformation; when the algorithm of the processing is SM4, the pre-mapping transformation performs the first affine transformation of the SM4 standard before GF (2^8) is transformed into the target composite domain, and simultaneously, the post-mapping transformation of SM4 performs the second affine transformation of the SM4 standard after the target composite domain is subjected to GF (2^8) isomorphic inverse mapping.
Based on the above SBOX logic optimization splitting description, and the "isomorphic implementation based on complex domain transformation" corresponding description about multiplicative inversion over galois fields described above, the present invention summarizes the circuit implementation architecture abstractions of SBOX logic compatible with a unified round of operation supporting two algorithms as: and sequentially combining the front mapping transformation, the composite domain inverse transformation and the rear mapping transformation.
The invention also protects a system for realizing the method compatible with the AES and SM4 block cipher algorithms of the technical scheme, and the system comprises a plurality of iteration wheel realization circuits, wherein a single iteration wheel realization circuit comprises a front linear transformation circuit, an S-box replacing transformation circuit and a rear linear transformation circuit.
The front linear transformation circuit is used to implement the line shift operation of AES and the round key operation of SM 4.
The S-box replacing transformation circuit is used for realizing front linear mapping, composite domain inversion transformation and rear mapping transformation.
The post-linear transformation circuit is used for realizing exclusive-or network transformation (XNWA) formed by combining column confusion and round key adding operation of AES and exclusive-or network transformation (XNWS) formed by combining L transformation and XOR combining operation of SM 4.
Fig. 3 shows a circuit implementation under a global pipeline mechanism for a whole packet operation compatible to support both said algorithms. As mentioned above, the general iteration round of the unified round operation can be realized by the circuit combination of "front linear transformation, S-box replacement transformation, and back linear transformation" in sequence. In addition to this general iteration round, the following points are also noted:
(1) for the SM4 algorithm, after the State (31) is obtained, the 'FP permutation' must be executed to obtain the output grouping result, that is, the SM4 executes the FP permutation after the unified round of operation is completed to obtain the output grouping result.
(2) The #0 round of the AES algorithm has only the "round key added" sub-transform.
(3) The MLaft of the # N round of the AES algorithm (i.e., the last round, N-10/12/14) does not contain column confusion (note: to distinguish, labeled MLaft' in fig. 3).
(4) The resource allocation ratio in the S-box replacement logic boxes of the two algorithms shown in the figure is different, and the specific ratio is 4:1(AES: SM 4). The resource allocation referred to herein is the number of times that the two algorithms call the S-box logic (note: the number of individual S-box logic circuits in a hardware circuit implementation).
Fig. 4 shows a circuit implementation architecture of a basic pipeline logic (circuit sub-module) of the unified round operation under a global pipeline design architecture in an embodiment, that is, a general single iteration round implementation circuit.
K and A/B/C/D respectively represent the round key input and the State excitation input of the submodule, and PO and NA/NB/NC represent the State result output of the submodule. PO ═ Primary Output, NA ═ New a, and the bit widths of these signals are all one word (32-bit).
The light grey box with the XOR word on top corresponds to the circuit implementation of the "front linear transformation" that is the SM4 algorithm (note: the line shift of the AES algorithm-this front linear transformation is not included in the underlying pipeline logic). The XOR part circuit can be implemented by a group of four-input exclusive or gates, or by a plurality of groups of two-input exclusive or gates.
As can be seen from this specific example, the front linear conversion circuit can be implemented by including at least one xor gate and one multiplexer.
The middle light gray box marked with the font of SB4X corresponds to the circuit implementation of the "S-box substitution transform" for the two algorithm multiplexes (note: 4X represents the substitution process for 4 bytes).
The corresponding light gray frame marked with XNWA/XNWS typefaces at the lower part is the circuit realization of the 'post linear transformation', namely the corresponding exclusive-or network sub-transformation of the AES/SM4 algorithm.
The is _ AES signal is 1/0 indicating that the currently selected active is the AES/SM4 algorithm, respectively.
A light gray box (N1, 2.) marked with a DFFuN word corresponds to the implementation of the SM4 algorithm on the multi-stage register circuit of the input a/B/C/D signal under the local pipeline design architecture, and the required specific register stage number is equal to the sum of the stages of the time sequence isolation sections of the two-part subcircuit of XOR and SB4X in the figure. For the trade-off between the two implementation indexes of high processing performance and low resource overhead, please refer to claim 3 and the related description.
FIG. 5 illustrates a circuit implementation architecture of a unified round robin SBOX logic under a local pipeline design architecture, in one embodiment.
Aiso and Siso respectively represent isomorphic mapping transformation logic of the two algorithms, which is responsible for isomorphically mapping a certain element in a GF (2^8) domain to a corresponding element in a GF ((2^4) ^2) composite domain, namely, Aiso is isomorphic mapping transformation logic of AES, and Siso is isomorphism mapping transformation logic of SM 4.
Aiso-1And Siso-1Respectively, the inverse transformation of the two isomorphic mappings, i.e., the respective isomorphic inverse mapping transformation logic of the algorithm.
Aaf and Aaf-1Respectively representing the affine transformation logic and the inverse affine transformation logic of the AES algorithm.
Saf1 and Saf2 represent the first affine transformation logic and the second affine transformation logic, respectively, of the SM4 algorithm.
MI on GF ((2^4) ^2) represents the unified multiply-invert logic on the target complex domain.
The three lines from top to bottom in the figure correspond to the SBOX logic of an AES reverse version, an AES forward version and SM4 in sequence.
Fig. 6 shows, in one embodiment, a circuit implementation of the non-multiplicative inversion logic (illustrated in the figure as affine transformation logic Aaf of the AES algorithm) in the multiplexed SBOX logic under a local pipeline design architecture.
The logic of isomorphic (inverse) mapping transformation class is equivalent to some kind of matrix multiplication logic, and on the premise that the coefficients of the polynomial are elements (i.e. 0 or 1) in the GF (2) domain, their circuit implementation can be abstracted as: the input data is X with 8 bits wide, and each bit is X7, x6., X0; the output data is Y with 8 bit width, and each bit is marked as Y7, y6., Y0; the arithmetic function of a certain output bit yi (i ∈ [0,7]) can be expressed as the formula yi ═ fi (x7, x6.., x 0). In short, one output bit yi is the exclusive or result of several input bits. For example, in one embodiment, the SM4 algorithm can be expressed as (where the "^" sign represents the exclusive or processing of a single bit):
y7=x6^x5^x4^x3^x2,
y6=x7^x3^x2^x1,
y5=x7^x5^x3^x2,
y4=x5^x3^x2,
y3=x7^x6^x5^x1,
y2=x6^x5^x4^x2,
y1=x6^x5^x2^x1,
y0=x6^x5^x1^x0,
observed from the above implementation expression, each yi output bit corresponds to a respective xi combinational xor arithmetic function, i.e., the fi (x7, x6.., x0) function. Note that since xi is 8 at the maximum, the number of exclusive-or processing objects for calculating a certain yi does not exceed 8.
When RTL coding is carried out on the implementation expression based on HDL language, the implementation expression can be written in a style similar to pseudo code, and can also be written in a manual intervention style based on DACSE optimization algorithm. In one embodiment, the programming style of manual intervention may implement RTL coding using the most basic XOR2/AND2 (two input XOR/AND) gate cell circuit. More generally, for different circuit implementation underlying platforms (such as ASIC or FPGA), different foundry processes (such as SMIC 40LL or UMC 55ULP), or different manufacturers' back-end EDA tools, the RTL codes of the two styles may have unpredictable differences in both performance and resources, so that a comparison analysis of actual implementation results of the two coding styles for a certain combination of implementation scenarios is a policy that can quickly obtain better results.
Similar to the implementation of isomorphic mapping, the (inverse) affine transformation class logic is also equivalent to some matrix multiplication logic, but its input data may have more constants of 1 or 0 (note: the original affine transformation is equivalent to the input multiplied by some specified constant matrix and then added with some specified 8-bit constant vector). Its calculation function can be expressed as the formula "yi ═ fi (x7, x6.., x0, c); (c is 0, 1) ". In short, an output bit yi is the exclusive or result of a number of input bits and a single bit constant that is not 0, i.e., 1. The implementation expression of the first affine transformation logic (Saf1) of the SM4 algorithm, for example, can be expressed as (here the "^" sign represents the exclusive or processing of a single bit; c1 is the constant 1 of a single bit)
y7=x7^x6^x4^x1^x0^c1,
y6=x7^x6^x5^x3^x0^c1,
y5=x7^x6^x5^x4^x2,
y4=x6^x5^x4^x3^x1^c1,
y3=x5^x4^x3^x2^x0,
y2=x7^x4^x3^x2^x1,
y1=x6^x3^x2^x1^x0^c1,
y0=x7^x5^x2^x1^x0^c1,
Observed from the above implementation expressions, each yi output bit corresponds to a respective xi & c1 combined exclusive or calculation function, i.e., the fi (x7, x6.., x0, c) function. Note that since the actual xi does not occur in 8 in the same expression, the number of subjects to be subjected to exclusive or processing for calculating a certain yi does not exceed 8.
In summary, both types of logic, whether isomorphic (inverse) mapping or (inverse) affine, can be implemented with logic circuits that contain only XOR gate elements. Specifically, under a certain implementation scenario combination, the implementation result circuits obtained based on the two RTL coding styles of the pseudo code and the manual intervention often have certain performance differences in terms of both processing performance and resource overhead.
For example, a manufacturer's back-end EDA tool is powerful and can take a short time to make multiple iterations, which is more suitable for the encoding style of pseudo code, so that the tool can obtain a higher automatic selection right and thus realize a resulting circuit that satisfies the implementation constraint better. Conversely, if the back-end tool performance of the EDA manufacturer is known to be less stable, a "path delay predictable" circuit implementation can be obtained by directly using some determined XORn/andsn gate unit (n represents the number of inputs) in consideration of the coding style of manual intervention. And then, the gate unit calling branches based on various possible realization scene combinations are matched, so that the reusability design of relaxing the circuit clock frequency constraint based on a certain specified performance index to reduce the resource overhead can be obtained by the minimum change workload.
FIG. 7 illustrates a circuit implementation of multiply-invert logic on the target composite domain of the multiplexed SBOX logic under a local pipeline design architecture, in one embodiment. We can assume that the input and output of the multiplication inversion are Xi and Yo, respectively; the sum of the circuit delays of several stages of AND2 gate units is na, AND the sum of the circuit delays of several stages of XOR2 gate units is nx (n is 1,2.
The grey italic characters in the figure represent the principle labels of the intermediate variables of the inversion algorithm, the italic characters represent the signal name labels in the circuit implementation, and the italic and underlined characters represent the corresponding logical sub-circuits of multiplication and multiplication inversion over the GF (2^4) base domain.
The rectangular boxes with small triangles in the grey bottom represent sequential logic elements such as D flip-flops, and the oval boxes with white bottom represent corresponding gate cell combinational logic elements (where the 1x, 2a, etc. words are the sum of the gate cell circuit delays described above). Each D flip-flop in fig. 7 is a separation point of the timing isolation segments on the left and right sides thereof.
In the figure, H represents the high 4-bit input data, L represents the low 4-bit input data, and M1/M2/M3 is the output result of the arithmetic circuit in the dashed box at the left side thereof (note: M word indicates that the dashed box circuit is the corresponding multiplication logic).
In the present embodiment, as shown in the present figure, the division principle of the timing isolation segment is: the combined path delay of each isolation segment does not exceed the sum of the circuit delays of the N-stage basic gate units, and the value of N is set according to preset standards of processing performance and memory resource overhead, for example, the combined path delay of each isolation segment does not exceed the sum of the circuit delays of the 3-stage basic gate units in one embodiment. For example, in the U _ mip sub-circuit, the sum of the circuit delays before its first stage D flip-flop is "1 x +2 a" (note: its "P-conversion from H _ ff2 to HHv" belongs to a permutation-by-bit process, so it can be considered in a hardware circuit implementation that no element delay of the circuit is introduced), and the sum of the circuit delays before its second stage D flip-flop is "3 x". Similarly, for the embodiment illustrated by FIG. 7, the multiplicative inversion logic on its target composite domain is designed to have 6 stages of timing isolation segments.
To put it more broadly, the hardware implementation method of the present invention, i.e. the circuit implementation architecture, can conveniently achieve the circuit structure optimization of the key sub-circuit of the multiplication inversion logic on the target composite domain of the multiplexing SBOX logic under different implementation scenario combinations (including but not limited to dimensions of a bottom platform, a factory process, a back-end implementation tool, etc.) by splitting or merging related combinational logic paths and simultaneously adding or deleting corresponding sequential logic elements, thereby obtaining the preferential bias or trade-off consideration on the two implementation indexes of high processing performance and/or low resource overhead.
FIG. 8 illustrates an overall circuit implementation of a unified round robin operation under a global pipeline mechanism in one embodiment. We can assume that a piece of basic pipeline logic as shown in fig. 4 is represented in this figure by a grey-bottom rectangle with 1pp typeface.
The boxes at the bottom of the diagonal lines in this figure represent sequential logic elements of the 32-bit wide register section State variable. The white-bottomed square boxes represent portions of a single round key (e.g., k0, k1, k2, k3) that are 32-bit wide and are also implemented as respective sequential logic elements. The is _ AES signal is 1/0 indicating that the currently selected active is the AES/SM4 algorithm, respectively. Note that: because of the particularities of the AES's row shift, this sub-transform itself, and the application characteristics of the global pipeline architecture, "row shift" is independent of the 1pp subcircuits in this embodiment.
For simplicity of the figure, multiplexer logic that "autonomously selects the desired round key in real time by hardware for sets of individual round keys" is not included in the figure. Similarly, differences of different iteration rounds of AES operations due to three initial key length modes are not included in the figure.
For the sake of emphasis on pipelining, the output signal, actually integrated in the 1pp sub-circuit, implemented in the form of sequential logic elements, is illustrated outside the 1pp block in this figure. Such as: for round #1 of the AES algorithm, the PO outputs of the four 1pp sub-circuits are shown as s0, s1, s2, s3 in that order; and the PO, NA, NB, NC outputs of the single 1pp sub-circuits for round #0 of the SM4 algorithm are shown as i1, i2, i3, s0, in that order, drawn together.
In accordance with the original definition of the AES algorithm standard, a single iteration round of AES (except round 0) requires 16S-box substitution transformations of the computation operation, which in this embodiment are performed in parallel on the time axis by four 1pp sub-circuits (which have 16 multiplexed SBOX logic integrated therein in total).
According to the original definition of the SM4 algorithm standard, a single iteration round of SM4 requires only 4S-box substitution transformations of the computation operation, which in this embodiment are performed in parallel on the time axis by a 1pp sub-circuit (in which a total of 4 multiplexed SBOX logics are integrated).
Combining the original definitions of the two algorithm standards, and considering that round operations #0 to #8 in the three initial key length modes of AES have no difference between the modes, it can be deduced that round operations #1 to #8 of AES algorithm and round operations #0 to #31 of SM4 algorithm are SBOX logic that needs the same number and total 128 shares in the design architecture of pipeline mechanism. In this embodiment, therefore, these SBOX logic can be integrated into multiplexed versions compatible with different algorithms, different encryption/decryption directions. For the other iteration rounds of the AES algorithm after round #8, the SBOX logic therein will adopt a simplified multiplexed version with SM4 removed, only with AES content, but compatible with supporting encryption/decryption directions, for the sake of saving resource overhead.
The following description will be given in detail with respect to the explanation of the english abbreviations used hereinafter in this patent. Such as AFISO, CFINV, AFIIA; ABIAI, ABII; SAISO, SIIA, etc. Wherein a is AES, S is SM 4; f is Forward, B is Backward; CF ═ Composite Field, INV ═ Inversion; ISO ═ isographic mapping, IIA ═ Inverse isographic & affinity mapping, II ═ Inverse isographic mapping, IA ═ Inverse affinity mapping.
The forward version of the AES algorithm, SBOX logic, is pre-mapped to the "isomorphic mapping transform that maps elements over GF (2^8) domain to target composite domain" -AFISO.
The finite field inversion of the forward version of the AES algorithm, SBOX logic, is "multiplicative inverse over target composite field" -CFINV.
Post-mapping of the forward version of the AES algorithm, SBOX logic, includes the "isomorphic inverse mapping that maps elements on the target composite domain back to the GF (2^8) domain," and the affine transformation in the AES standard "-AFIIA.
The pre-mapping transformation of SBOX logic, an inverse version of the AES algorithm, includes "inverse affine transformation in the AES standard, and isomorphic mapping transformation mapping elements over GF (2^8) field to the target composite field" -ABIAI.
The finite field inversion of the SBOX logic of the inverse version of the AES algorithm is "multiplicative inverse over the target composite field" -CFINV.
The post-mapping of SBOX logic, an inverse version of the AES algorithm, is "isomorphic inverse mapping transform that maps elements on the target composite domain back to the GF (2^8) domain" -ABII.
The pre-mapping transformation of the SBOX logic of the SM4 algorithm comprises the "first affine transformation in the SM4 standard, and the isomorphic mapping transformation that maps elements over a GF (2^8) domain to a target composite domain" -SAISO.
The finite field inversion of the SBOX logic of the SM4 algorithm is "multiplicative inversion over target composite field" -CFINV.
Post-mapping transformations of the SBOX logic of the SM4 algorithm include the "isomorphic inverse mapping transformation that maps elements on the target composite domain back to the GF (2^8) domain, and the second affine transformation in the SM4 standard" -SIIA.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method for compatible with two block cipher algorithms of AES and SM4, comprising: uniformly converting the unified round operation of SM4 and the operation process of a single general iteration round from 1 st round to 8 th round in the unified round operation of AES into a global pipeline operation process of front linear transformation, S-box replacement transformation and rear linear transformation;
the front linear transformation comprises at least one stage of time sequence isolation section, and any stage of the time sequence isolation section is composed of at least one exclusive-OR gate or is a linear transformation time sequence isolation section for realizing the displacement processing.
2. The method of claim 1 for compatibility with both AES and SM4 block cipher algorithms, further comprising: the S-box replacing transformation of AES and SM4 is uniformly converted into a partial pipeline type operation process of front mapping transformation, composite domain inversion transformation and rear mapping transformation, so that the S-box replacing transformation comprises a multi-stage time sequence isolation section, and any one stage of the time sequence isolation section is composed of at least one exclusive-OR gate and/or at least one AND gate;
the pre-mapping transformation transforms elements on GF (2^8) domain different from SM4 in AES in the S-box substitution transformation into the same target composite domain linearly isomorphic to the respective original GF (2^8) domain.
3. The method of claim 2, wherein the predetermined criteria for processing performance and storage resource overhead are achieved by designing the sum of the number of pre-linear transformed time-series isolated segments required for SM4 in a single generic iteration round and the number of S-box substitution transformed time-series isolated segments.
4. The method of any of claims 1 to 3, wherein the single iterative round operation process from the 9 th round to the last round of the unified round operation of AES is also converted into a globally pipelined operation process of the pre-linear transformation, the S-box substitution transformation and the post-linear transformation.
5. The method of claim 1, wherein when the algorithm being processed is AES, the first iteration round has only round-key sub-transform, the front linear transforms of the other iteration rounds include row-shift operations of AES, the back linear transforms include xor network transforms that are a combination of column-wise and round-key operations of AES, and the back linear transforms of the last iteration round do not contain column-wise permutations of AES;
when the algorithm of the processing is SM4, the front linear transformation comprises the key adding operation of the SM4 round, the rear linear transformation comprises the exclusive-or network transformation formed by combining the L transformation and the XOR operation of the SM4 round, and FP replacement is performed after the last round of rear linear transformation is finished.
6. The method of claim 2, wherein when the algorithm to be processed is an encryption algorithm of AES, the post-mapping transformation performs an affine transformation of the AES standard after performing GF (2^8) isomorphic inverse mapping corresponding to AES; when the processed algorithm is a decryption algorithm of AES, after the inverse affine transformation of the AES standard is carried out by the pre-mapping transformation, GF (2^8) isomorphic mapping corresponding to the AES is carried out;
when the processed algorithm is SM4, the pre-mapping transformation performs the first affine transformation of the SM4 standard before converting the GF (2^8) domain into the target composite domain, and simultaneously, the post-mapping transformation of the SM4 performs the second affine transformation of the SM4 standard after converting the target composite domain back to the corresponding GF (2^8) domain through GF (2^8) isomorphic inverse mapping.
7. The method of claim 2, wherein the target complex field is the same GF ((2^4) ^2) field.
8. The method of claim 3, wherein the time series isolation sections are divided according to a principle that a combined path delay of each isolation section does not exceed a sum of circuit delays of N stages of basic gate units, and a value of N is set according to a preset standard of processing performance and memory resource overhead.
9. A system for implementing a method of any one of claims 1 to 8 compatible with both AES and SM4 block cipher algorithms, comprising a plurality of iteration round implementation circuits including a general iteration round implementation circuit, wherein a single general iteration round implementation circuit comprises:
a front linear transformation circuit for implementing a line shift operation of AES and a round key addition operation of SM 4;
the S-box replacing transformation circuit is used for realizing front linear mapping, composite domain inverse transformation and rear mapping transformation;
and the post-linear transformation circuit is used for realizing exclusive-or network transformation (XNWA) formed by combining column confusion and round key adding operation of AES and exclusive-or network transformation (XNWS) formed by combining L transformation and XOR combination operation of SM 4.
10. The system of claim 9, wherein the front linear transformation circuit comprises at least one exclusive or gate and a multiplexer.
CN202210028032.7A 2022-01-11 2022-01-11 Method and system compatible with AES and SM4 block cipher algorithms Pending CN114374507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210028032.7A CN114374507A (en) 2022-01-11 2022-01-11 Method and system compatible with AES and SM4 block cipher algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210028032.7A CN114374507A (en) 2022-01-11 2022-01-11 Method and system compatible with AES and SM4 block cipher algorithms

Publications (1)

Publication Number Publication Date
CN114374507A true CN114374507A (en) 2022-04-19

Family

ID=81144543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210028032.7A Pending CN114374507A (en) 2022-01-11 2022-01-11 Method and system compatible with AES and SM4 block cipher algorithms

Country Status (1)

Country Link
CN (1) CN114374507A (en)

Similar Documents

Publication Publication Date Title
Lee et al. SPRING: a novel parallel chaos-based image encryption scheme
US9134953B2 (en) Microprocessor Shifter Circuits Utilizing Butterfly and Inverse Butterfly Routing Circuits, and Control Circuits Therefor
US7174014B2 (en) Method and system for performing permutations with bit permutation instructions
Aikata et al. KaLi: A crystal for post-quantum security using Kyber and Dilithium
US8411853B2 (en) Alternate galois field advanced encryption standard round
KR20160132943A (en) Solving digital logic constraint problems via adiabatic quantum computation
US20020108030A1 (en) Method and system for performing permutations using permutation instructions based on modified omega and flip stages
US20150121035A1 (en) Systems and Methods for Implementing Low-Latency Lookup Circuits Using Sparse Hash Functions
US20170373836A1 (en) AES Hardware Implementation
KR100800468B1 (en) Hardware cryptographic engine and method improving power consumption and operation speed
CN108959168B (en) SHA512 full-flow water circuit based on-chip memory and implementation method thereof
Khairallah et al. Looting the LUTs: FPGA optimization of AES and AES-like ciphers for authenticated encryption
CN116318660B (en) Message expansion and compression method and related device
Tillich et al. Area, delay, and power characteristics of standard-cell implementations of the AES S-box
CN114095149A (en) Information encryption method, device, equipment and storage medium
Hilewitz et al. Fast bit gather, bit scatter and bit permutation instructions for commodity microprocessors
CN114374507A (en) Method and system compatible with AES and SM4 block cipher algorithms
US6865272B2 (en) Executing permutations
Khairallah et al. Romulus: Lighweight aead from tweakable block ciphers
Modi et al. Effective hardware architectures for LED and PRESENT ciphers for resource-constrained applications
Hulle et al. Compact Reconfigurable Architecture for Sosemanuk Stream Cipher
TWI857674B (en) Hardware-based galois multiplication
CN113343276B (en) Encryption method of light-weight block cipher algorithm GCM based on generalized two-dimensional cat mapping
CN115037485B (en) Method, device and equipment for realizing lightweight authentication encryption algorithm
TW202409827A (en) Hardware-based galois multiplication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination