CN117240601B

CN117240601B - Encryption processing method, encryption processing circuit, processing terminal, and storage medium

Info

Publication number: CN117240601B
Application number: CN202311485835.6A
Authority: CN
Inventors: 任培培; 郭超
Original assignee: Shenzhen Dapu Microelectronics Co Ltd
Current assignee: Shenzhen Dapu Microelectronics Co Ltd
Priority date: 2023-11-09
Filing date: 2023-11-09
Publication date: 2024-03-26
Anticipated expiration: 2043-11-09
Also published as: CN117240601A

Abstract

The encryption processing method reads a key of the asymmetric encryption algorithm, based on an operation expression of the asymmetric encryption operation, adopts a binary expansion method to expand a power exponent according to the bit width of the key, substitutes information to be processed and the key into modular exponentiation in the operation expression, and converts the modular exponentiation into modular multiplication operation; based on Montgomery modular multiplication expressions, invoking at least two multiplication arrays and calculating the modular multiplication operation; and decrypting or signing the information to be processed according to the operation result. The method is based on the classical Montgomery modular multiplication principle, and realizes a high-radix, multi-multiplication array and pipelined efficient modular multiplication circuit on the design of an integrated circuit; and the working frequency is improved by optimizing the scheduling and the resource multiplexing, so that the operation efficiency is greatly improved.

Description

Encryption processing method, encryption processing circuit, processing terminal, and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to an encryption processing method, an encryption processing circuit, a processing terminal, and a storage medium based on an asymmetric encryption algorithm.

Background

RSA (asymmetric encryption algorithm) is the most widely used public key encryption system at present, and is mainly used for performing modular exponentiations of large integers, typically such as RSA1024 and RSA2048, and along with the increase of security requirements, RSA4096 is also more and more common. The cracking difficulty of the large-integer RSA is very high, but the calculation amount of encryption/decryption is also high, the calculation time is long, and the calculation efficiency becomes the bottleneck for restricting the RSA application. Therefore, there is great interest in finding efficient implementation techniques for large integer modular exponentiations.

In the research process of conception and formation of the application, the applicant finds at least the following problems that the complexity of RSA algorithm implementation is gradually optimized and degraded from algorithm theory, from modular exponentiation to modular multiplication to multiplication/addition/shift, the implementation of hardware is friendly, but specific implementation circuits are required to balance resource and efficiency problems, the parallelism of most schemes is not high, or parallel units are small, and the clock frequency is low. The method is not perfect in the aspects of resource multiplexing and parallel scheduling.

Disclosure of Invention

In order to alleviate the above problems, the present application provides an encryption processing method, including:

reading a key of an asymmetric encryption algorithm in response to obtaining information to be processed of the asymmetric encryption algorithm;

Based on an operation expression of the asymmetric encryption operation, adopting a binary expansion method to expand the exponentiation according to the bit width of the key, substituting the information to be processed and the key into modular exponentiation in the operation expression, and converting the modular exponentiation into modular multiplication operation;

based on Montgomery modular multiplication expressions, invoking at least two multiplication arrays and calculating the modular multiplication operation;

and decrypting or signing the information to be processed according to the operation result.

Optionally, the key comprises a noccrt key; the step of invoking at least two multiplication arrays based on Montgomery modular multiplication expressions to calculate the modular multiplication operation comprises the following steps:

and calling at least two multiplication arrays to be connected in series, and performing single-line iterative computation on the modular multiplication operation, wherein the sum of the total bit widths of the at least two multiplication arrays is the same as the number of the bit widths of the non-CRT key.

Optionally, the key comprises a CRT key; the step of invoking at least two multiplication arrays based on Montgomery modular multiplication expressions to calculate the modular multiplication operation comprises the following steps:

and calling at least two multiplication arrays, and carrying out parallel calculation on the modular multiplication operation, wherein the total bit width of each multiplication array is the same as the bit width of the CRT key.

Optionally, the multiplication array includes a plurality of multiplication units cascaded in turn, and the step of calling at least two multiplication arrays based on the montgomery modular multiplication expression to calculate the modular multiplication operation includes:

and responding to the acquisition of a preset base value representing the unit bit width of the multiplication unit, splitting the calculated variable in the modular multiplication operation according to the preset base value, and respectively and correspondingly inputting the calculated variable to each multiplication unit to carry out iterative operation.

Optionally, the step of splitting the calculated variable in the modular multiplication operation according to the preset base value to respectively input to each multiplication unit for iterative operation in response to obtaining the preset base value representing the unit bit width of the multiplication unit includes:

and for the operation of the complementary modulus in the Montgomery modular multiplication expression, in each iterative calculation of the multiplication array, the iterative value of the output result of the least-significant multiplication unit is taken to realize.

Optionally, for the multiplication in the Montgomery modular multiplication expression, parallel importing split calculation variables according to a multiplication array of a plurality of multiplication units arranged in cascade in sequence, so that the multiplication units of the multiplication array execute the multiplication at the same time; and based on multiplication calculation of the plurality of multiplication units, the product of each stage of multiplication unit is added with the carry of the lower stage of multiplication unit to realize the multiplication operation.

Optionally, for a division operation, in each iterative calculation of the multiplication array, the output result of each multiplication unit is carried to a lower stage multiplication unit to implement the division operation.

Optionally, in response to obtaining a preset base value representing a unit bit width of the multiplication unit, splitting the calculated variable in the modular multiplication operation according to the preset base value, so as to respectively and correspondingly input the calculated variable to each multiplication unit to perform iterative operation, and then the steps include:

and sequentially importing the calculated variables according to the arrangement sequence of the multiplication units in the multiplication array so as to obtain the output result of the multiplication array as the result of the modular multiplication operation.

Optionally, the step of calling at least two multiplication arrays based on the montgomery modular multiplication expression to calculate the modular multiplication operation includes:

executing a preprocessing mode, and calculating the product of an initial value and a preset expansion value to obtain a Montgomery initial expression;

performing power exponent iterative computation on the Montgomery initial expression based on an L-R binary expansion method to obtain a Montgomery iterative expression;

and eliminating a preset expansion value of the Montgomery iterative expression to obtain a calculation result of the modular multiplication operation.

Optionally, the application further provides an encryption processing circuit, which comprises a modular multiplication module, a modular exponentiation control module and a parameter configuration module;

the modular multiplication module comprises at least two multiplication arrays and is used for calculating modular multiplication operation under the control of the modular power control module, and the multiplication arrays comprise a plurality of multiplication units which are sequentially cascaded;

the modular exponentiation control module is connected with the parameter configuration module and is used for calling the multiplication array based on configuration parameters, and obtaining a modular exponentiation operation result through the modular multiplication operation;

the parameter configuration module is used for generating the configuration parameters based on the Montgomery modular multiplication expression based on the calculation requirement of the asymmetric encryption algorithm and receiving the modular exponentiation operation result.

Optionally, the modular multiplication module comprises at least two multiplication arrays;

when the asymmetric encryption algorithm is in a CRT mode, the at least two multiplication arrays are called to perform parallel calculation on the modular multiplication operation, and the total bit width of each multiplication array is the same as the bit width of a CRT key in number;

and when the asymmetric encryption algorithm is in a non-CRT mode, the at least two multiplication arrays are called to be connected in series so as to perform single-line iterative computation on the modular multiplication operation, and the sum of the total bit widths of the at least two multiplication arrays is the same as the number of the bit widths of a non-CRT key.

Optionally, the modular exponentiation control module includes a global state control unit connected with the at least two multiplication arrays for performing modular exponentiation scheduling, parameter reading and writing, and initialization process control of the at least two multiplication arrays.

Optionally, the modular exponentiation control module further comprises a multiplication array input management unit connected with the at least two multiplication arrays for managing factor inputs of the multiplication arrays.

Optionally, the modular exponentiation control module further includes a flow scheduling unit, where the flow scheduling unit is connected to the global state control unit, and is configured to adjust the configuration parameter format and perform operation flow scheduling.

Optionally, the modular exponentiation control module further comprises an exponentiation management unit connected with the multiplication array input management unit for arranging exponentiations in binary order for outputting in order bit by bit during operation.

Optionally, the modular exponentiation control module further includes an initialization management unit connected between the global state control unit and the at least two multiplication arrays for scheduling the at least two multiplication arrays for an initialization operation to obtain a variant of the modular parameter.

Optionally, the modular exponentiation control module further includes a first buffer unit and a second buffer unit, where the first buffer unit and the second buffer unit are respectively connected with the global state control unit, the first buffer unit is used for buffering a first factor of the multiplication array operation, and the second buffer unit is used for buffering a second factor of the multiplication array operation.

Optionally, the modular exponentiation control module further includes a parameter storage unit, where the parameter storage unit is connected to the first buffer unit and the second buffer unit, respectively, and is used to store fixed parameters and operation flow parameters of the encryption processing circuit.

Optionally, the parameter configuration module includes a parameter configuration table unit, where the parameter configuration table unit is connected with the modular exponentiation control module and is used to receive and store configuration parameters sent by the host end, so as to be read by the modular exponentiation control module; and receiving and storing the operation result of the modular exponentiation control module for the host side to call.

Optionally, the parameter configuration table unit includes at least two memories, and in a first storage management mode, the at least two memories can be read and written by the modular exponentiation control module at the same time; in the second storage management mode, the at least two blocks of memories are encoded into memories with continuous parameter addresses for the host side to sequentially read and write.

Optionally, the multiplication array includes a plurality of multiplication units cascaded in turn, each multiplication unit including a multiplier, a first adder, and a second adder connected in turn;

the multiplier is used for performing multiplication operation of the first factor and the second factor for the first factor and the second factor read from the modular exponentiation control module so as to output a first calculation result to the first adder;

the first adder is configured to perform a first addition operation of a first calculation result of the current-stage multiplication unit and a first carry data of a lower-stage multiplication unit, so as to output a second calculation result to the second adder;

the second adder is configured to perform a second adding operation of the second calculation result and the output result of the higher-stage multiplication unit, and store and output the output result of the present-stage multiplication unit.

The application also provides a processing terminal comprising an interconnected processor and storage medium, wherein:

the storage medium is used for storing a computer program;

the processor is configured to read the computer program and execute the computer program to implement the encryption processing method as described above.

The application also provides a processing terminal comprising the encryption processing circuit.

The present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the encryption processing method as described above.

According to the encryption processing method, the encryption processing circuit, the processing terminal and the storage medium based on the asymmetric encryption algorithm, through reading a key of the asymmetric encryption algorithm and based on an operation expression of the asymmetric encryption operation, a binary expansion method is adopted to expand a power exponent according to the bit width of the key, and the information to be processed and the key are substituted into modular exponentiation in the operation expression and converted into modular multiplication operation; based on Montgomery modular multiplication expressions, invoking at least two multiplication arrays and calculating the modular multiplication operation; and decrypting or signing the information to be processed according to the operation result. The scheme realizes a high-base, multi-multiplication array and pipelined efficient modular multiplication circuit on the design of an integrated circuit based on the classical Montgomery modular multiplication principle; and by optimizing the scheduling and the resource multiplexing, a high-speed modular exponentiation circuit scheme capable of supporting the scene switching of the CRT and the non-CRT is realized, the working frequency is improved, and the operation efficiency is greatly improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a flowchart of an encryption processing method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a cascade of multiplication units according to an embodiment of the present application.

Fig. 3 is a signal input/output diagram of a multiplication unit according to an embodiment of the present application.

Fig. 4 is a block diagram of an encryption processing circuit according to an embodiment of the present application.

Fig. 5 is a schematic diagram of encryption processing circuit connection according to an embodiment of the present application.

FIG. 6 is a diagram illustrating a coding format of a host-side command according to an embodiment of the present application.

Fig. 7 is a schematic diagram of a carry-over process according to the embodiment of fig. 2 of the present application.

Fig. 8 is a schematic diagram of a noccrt modular array of the embodiment of fig. 2 of the present application.

Fig. 9 is a schematic diagram of a CRT modular array of the embodiment of fig. 2 of the present application.

FIG. 10 is a schematic diagram of the connection of multiplication units in the large number multiplication array operation of the embodiment of FIG. 2.

The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings. Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element defined by the phrase "comprising one … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element, and furthermore, elements having the same name in different embodiments of the present application may have the same meaning or may have different meanings, a particular meaning of which is to be determined by its interpretation in this particular embodiment or by further combining the context of this particular embodiment.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

First embodiment

In one aspect, the present application provides an encryption processing method, and fig. 1 is a flowchart of an encryption processing method according to an embodiment of the present application.

As shown in fig. 1, in an embodiment, the encryption processing method based on the asymmetric encryption algorithm includes:

s100: and reading a key of the asymmetric encryption algorithm in response to obtaining information to be processed of the asymmetric encryption algorithm.

Illustratively, RSA (asymmetric encryption algorithm) is the most widely used public key encryption system at present, which mainly performs modular exponentiations of large integers, typically such as RSA1024 and RSA2048, and RSA4096 is also becoming more and more common with the increase of security requirements. The cracking difficulty of the large-integer RSA is very high, but the calculation amount of encryption/decryption is also high, the calculation time is long, and the calculation efficiency becomes the bottleneck for restricting the RSA application. Therefore, there is great interest in finding efficient implementation techniques for large integer modular exponentiations.

RSA belongs to an asymmetric encryption algorithm, i.e. a public key and a private key in pairs. Generating a key pair: first, two different large prime numbers p and q are randomly selected, and n=p×q and Φ (n) = (p-1) ×q-1 are calculated. Next, an integer e is selected to satisfy gcd (e, Φ (n))=1, typically e=65537. Then, according to formula e mod(Φ (n))=1, d can be calculated. Finally, (n, e) is disclosed as a public key, and (n, d) is stored as a private key, wherein n is a modulus value.

The information to be processed may be plaintext or ciphertext, for example. Alternatively, when the plaintext is acquired, the plaintext may be encrypted according to the key; when the ciphertext is obtained, the ciphertext may be decrypted based on the key.

S200: based on an operation expression of the asymmetric encryption operation, a binary expansion method is adopted to expand the exponentiation according to the bit width of the secret key, the information to be processed and the secret key are substituted into modular exponentiation in the operation expression, and the modular exponentiation is converted into modular multiplication.

The exponent, when RSA signs/decrypts, refers to the private key; in RSA signing/encryption, the public key is referred to. The exponent bit width is determined by the RSA modulus field size, such as RSA2048, then the exponent maximum bit width is 2048 bits. Alternatively, the general private key is a2048 bit full bit width value, and the public key bit width is less than 2048 bits.

For example, the modular exponentiation amount depends on the magnitude of the exponentiation exponent exp, and generally the public key operation exp=e=65537 is calculated to be small, but the private key operation exp=d is a very large integer, and the operation time is very long. The RSA private key operation is optimized by using the CRT (China remainder theorem), the large modular exponentiation can be split into two small modular exponentiation parallel computation, and in theory, compared with the nonCRT, 4 times of acceleration can be realized. For example, RSA1024 private key operation requires performing a modular exponentiation of 1024 bits, while CRT-RSA1024 private key operation requires performing a modular exponentiation of 512 bits in parallel. The private key form of the noccrt is (n, d), while the private key form of the CRT is (p, q, dp, dq, qinv). The CRT-RSA operational expression is as follows:

m1 = C^dpmodp； m2 = C^dqmodq； h = (qinv*((m1-m2)modp))modp； P = m2 + h*q。

The power exponent is expanded by E= [ E ] by adopting a binary expansion method _h-1 ,…e ₀ ] ₂ Where h is the bit width of the key. Modular exponentiation is reduced to modular multiplication.

RSA operations are in principle modular exponentiations (M≡E)modN). In operation, plaintext or ciphertext is substituted into M and the key is substituted into E, N (where E may be a public key, or a private key).

S300: and calling at least two multiplication arrays based on the Montgomery modular multiplication expression, and calculating the modular multiplication operation.

Illustratively, solving the expression of p=c ζ is as follows:

step 1, setting P=1;

step 2. I. the step of decreasing from h-1 to 0 is performed:

wherein, step 2.1. P=p×p;

step 2.2. If e _i =1, then p=p×c;

step 3, outputting P=C≡E; wherein P, C, E is a binary operation process that generally refers to a mathematical power operation related variable, expressing a known base C and a power exponent E, and how to solve for a result P.

Solving the large number modular multiplication, the Montgomery modular multiplication is most effective, and simplifying the modular multiplication into operations which are easy to be realized by hardware, such as shift, addition, multiplication and the like, becomes a theoretical basis for realizing an RSA algorithm by hardware. Montgomery modular multiplication expression is as follows:

step a) precalculating, modulus modification N _var = N * (N’modBeta), wherein (-N')modR=1, r=β ζ, β=2radius, radius being the Radix;

Step b) set s=0;

step c) i rises from 0 to n execution (n=size) _N /Radix）：

c.1) Q _i = Smodβ;

c.2) S = S/β+ (Q _i *N _var )/β+ A _i *B + ((Q _i =0) ? 0 : 1)；

Step d) outputting s=a×b×r ^-1 (modN)。

Illustratively, to support both the noccrt and CRT modes, a modular array can be deployed in two ways: when operating in the noccrt mode, generally referred to as public key operation, all multiplication units are cascaded in sequence, corresponding to a set of lock registers, and a set of shift registers. When working in CRT mode, generally referred to as private key operation, all CELL are split into 2 groups (of course, more than 2 groups are also possible) which are independently parallel, and a plurality of multiplication units in each group are cascaded in turn, corresponding to two groups of lock registers and two groups of shift registers respectively. Therefore, the CRT mode multiplexing and the nonCRT mode multiplexing are realized, the large modular multiplication is split into two groups of small modular multiplication with halving scale, parallel modular multiplication can be provided for modular exponentiation of an upper layer, and the maximum acceleration can be up to 4 times.

S400: and decrypting or signing the information to be processed according to the operation result.

Generally, a public key is used for encryption/authentication and a private key is used for decryption/signing. The RSA operation expression is as follows:

C = Enc(P, K _e ) = P^e (modn), plaintext→ciphertext.

P = Dec(C, K _d ) = C^d (modn), ciphertext→plaintext.

It follows that when obtaining plaintext, the plaintext may be cryptographically signed according to the key; when the ciphertext is obtained, the ciphertext may be decrypted based on the key.

In this embodiment, based on classical Montgomery modular multiplication principles, a high-radix, multi-multiplication array, pipelined, efficient modular multiplication circuit is implemented on an integrated circuit design; and by optimizing the scheduling and the resource multiplexing, a high-speed modular exponentiation circuit scheme capable of supporting the scene switching of the CRT and the non-CRT is realized, the working frequency is improved, and the operation efficiency is greatly improved.

Illustratively, if an ASIC implementation of RSA2048 is designed, the maximum bit width of the key, size=2048, is 32 modulo the number of CELLs of the array, and a single CELL processes a bit width of 64 bits (i.e., calls 1 x multi-64 and 2 x add-64). When operating in the noccrt mode, typically referred to as public key operation, 32 sets of CELL CELLs are cascaded in sequence, corresponding to a set of 2048-bit latch registers B and N, and a set of 2048-bit shift registers a.

Illustratively, if an ASIC implementation of RSA2048 is designed, the maximum bit width of the key, size=2048, modulo the number of CELLs of the array is 32, and a single CELL handles 64 bits wide from a resource perspective. When operating in CRT mode, generally referred to as private key operation, all CELL are split into 2 groups of independent parallel, with 16 CELL units within each group cascading in turn, corresponding to two respective 1024-bit latch registers B [0] [1] and N [0] [1], and two 1024-bit shift registers A [0] [1].

Optionally, the multiplication array comprises a plurality of multiplication units cascaded in turn. The step of invoking at least two multiplication arrays based on Montgomery modular multiplication expressions to perform parallel computation on the modular multiplication operation comprises the following steps:

For example, for the montgomery representation step c described above, a predetermined base value (Radix) may be set during further disassembly into ASIC implementations to characterize the cell bit-widths, and the main computation variable may be split into the input factors of multiple multiplication cells according to the predetermined base value.

The least significant bit refers to the 0 th bit (i.e., the least significant bit) in a binary digit. The least significant bits and the most significant bits are the corresponding concepts.

Fig. 2 is a schematic diagram of a cascade of multiplication units according to an embodiment of the present application. Fig. 3 is a signal input/output diagram of a multiplication unit according to an embodiment of the present application.

Referring to fig. 2 and 3, the input signal of the present stage multiplication unit CELL may have a first input signal B _i Second input signal N _i Third input signal A _i Fourth input signal Q _i . The calculation result after the CELL operation of the multiplication unit of the present stage is output as an output result S _i 。

Illustratively, in a cascaded multiplication CELL, each stage of multiplication CELL inputs a different stage of data. With a first input signal B _i For example, the first input signal input into the multiplication unit CELL of the present stage is B _i-1 The first input signal of the multiplication unit CELL input of the higher stage is B _i+1 Other input and output signals are analogized in order and are not described in detail herein. In the signal descriptions of the present application, a signal that does not account for the progression defaults to a signal of the present level.

Illustratively, for the Montgomery expression Q in step c above _i = Smodβ may be implemented by taking the preset base value bit of the least significant bit (least significant bit, LSB) of S at each iteration, i.e. taking the iteration value of element S0.

Referring to FIGS. 2-3, for an example, for A in the Montgomery expression _i *B _i And Q _i *N _i Then consider that A.times.B and Q.times.N are split into N products of units and iterateAccumulating, namely arranging a multiplication array according to multiplication units, parallelly importing split N and B values, and importing A and Q in a serial manner according to the cross of the multiplication units, wherein all units simultaneously execute multiplication in parallel, namely simplifying the multiplication into solution A _i *B _i And Q _i *N _i The problem is that the product of the current stage multiplication CELL (i) should be added to the carry of the lower stage multiplication CELL (i-1), i.e. a of a single CELL, taking into account the multiplication carry _i *B _i Should be implemented as LSB (A) _i *B _i )+MSB(A _i -1*B _i -1), the same applies Q _i *N _i Should be implemented as LSB (Q) _i *N _i )+MSB(Q _i -1*N _i -1)。

2-3, exemplary, for the S/beta sum (Q x N) in the Montgomery expression _var ) Since β=2radio, this division means that all multiplication units of S are shifted to the right by the preset base value radio, which is exactly equivalent to carry from the i+1th stage to the i-th stage, i.e. the S/β implementation is reduced to a single unit S _i+1 Down stage S _i Carry, (Q.times.N) _var ) The beta implementation is reduced to a single unit (Q _i+1 *N _i+1 ) Down stage (Q) _i *N _i ) Carry.

With continued reference to fig. 2-3, and illustratively, in accordance with a ₀ ,…,A _n-1 Sequentially introducing all A _i (when implemented, Q _i Without introduction ofEach time Q _i Equal to the current S ₀ ) The output of all the multiplication units obtained S _n-1 ,…S _i ,…S ₀ ]I.e. Montgomery modular multiplication S=A.times.B.times.R.times.1modN。

Illustratively, the RSA phase is to perform a modular exponentiation, which is also converted to a modular exponentiation according to the L-R binary expansion method. Because the underlying modular multiplication circuit is based on Montgomery algorithm, only the expression S=A×B×R≡1 can be realizedmodN to implement the modular exponentiation expression p=c≡modN, besides the square sum multiplication, a scheduling modular multiplication circuit is needed for preprocessing and post-processing. Wherein the preprocessing is to multiply an initial input by an R value, converting to a montgomery representation. The result SR obtained by the power exponent iteration of the L-R binary expansion method is still in the Montgomery expression. The post-processing is to eliminate the R value from the SR result to obtain the final target result S.

Second embodiment

The present application further provides an encryption processing circuit, and fig. 4 is a block diagram of the encryption processing circuit according to an embodiment of the present application.

As shown in fig. 4, in an embodiment, the encryption processing circuit includes a modular multiplication module 1, a modular exponentiation control module 2, and a parameter configuration module 3.

The modular multiplication module 1 comprises at least two multiplication arrays, which are used for calculating modular multiplication operation under the control of the modular power control module 2, and the multiplication arrays comprise a plurality of multiplication units which are cascaded in turn.

Illustratively, the multiplication array may include the same number of multiplication units as the key bit width of the asymmetric encryption algorithm.

The modular exponentiation control module 2 is connected with the parameter configuration module 3 and is used for calling the multiplication array based on configuration parameters, and obtaining a modular exponentiation operation result through the modular multiplication operation.

The parameter configuration module 3 is configured to generate the configuration parameter based on a Montgomery modular multiplication expression based on a calculation requirement of an asymmetric encryption algorithm, and receive the modular exponentiation result.

Referring to fig. 5, in one embodiment, the modular exponentiation module 1, the modular exponentiation control module 2, and the CRT-RSA acceleration circuitry are built layer by layer from bottom to top.

The modular multiplication layer is a multiplication array layer, namely a modular multiplication module (Share multiple module), so that the resource multiplexing of Montgomery modular multiplication and large number multiplication is realized, A, B and M are input ends, and when in a mont mode, an output end S needs to be subjected to a reduction, namely output residual processing, so that an output value is ensured to be always in a modular mod domain.

The modular exponentiation level is a modular exponentiation level that includes all modules within a dashed box except the modular exponentiation level.

The configuration or data input is encoded by the host end and is sent to the circuit through the parameter configuration module 3 by the system bus interface. Illustratively, in the parameter configuration module 3, circuit analysis is performed by a Decoder, i.e., a data analysis unit, and the hardware automatically extracts configuration or parameters. The output result or state after operation is stored in the result and status address space of the MEMWwrapper parameter table, and is read and encoded by a Decoder, namely a data analysis unit, and then returned to the host end through a system bus interface.

when the asymmetric encryption algorithm is in a CRT mode, the at least two multiplication arrays can be called to perform parallel calculation on the modular multiplication operation, and the total bit width of each multiplication array is the same as the bit width of a CRT key in number;

when the asymmetric encryption algorithm is in a noccrt mode, the at least two multiplication arrays can be called to be connected in series to perform single-line iterative computation on the modular multiplication operation, and the sum of the total bit widths of the at least two multiplication arrays is the same as the number of bit widths of a noccrt key. A single line iterative calculation can be understood as an iterative calculation with all multiplication arrays in series.

Illustratively, when the asymmetric encryption algorithm is in the CRT mode, the at least two multiplication arrays may include a first multiplication array and a second multiplication array, and when the asymmetric encryption algorithm is in the noccrt mode, the at least two multiplication arrays may be connected in series as a third multiplication array.

Illustratively, if an ASIC implementation of RSA2048 is designed, the maximum bit width of the key, size=2048, modulo the number of CELLs of the array is 32, and a single CELL handles 64 bits wide from a resource perspective. When operating in the noccrt mode, typically referred to as public key operation, 32 sets of CELL CELLs are cascaded in sequence, corresponding to a set of 2048-bit latch registers B and N, and a set of 2048-bit shift registers a.

Illustratively, the global state control unit may include a first global control state machine and a second global control state machine; as shown in fig. 5, the first global control state machine (i.e., modexpcl0) can be connected to the first multiplication array or the third multiplication array, for performing modular power scheduling, parameter reading and writing, and initialization process control of the first multiplication array or the third multiplication array; the second global control state machine (i.e., modExpCHCl [1 ]) is capable of interfacing with the second multiplication array for modular exponentiation scheduling, parameter reading and writing, and initialization process control of the second multiplication array.

With continued reference to fig. 5, the modexpcls are illustratively global state control units, including two global state control machines, a first global state control machine and a second global state control machine, responsible for modular exponentiation scheduling, MEM (parameter read write) access, init (initialization procedure) control.

With continued reference to fig. 5, illustratively, the Management, i.e., multiplication array input Management unit, may manage the a and B inputs of the underlying multiplication units based on cached values from modexpclcontrol information, expBit shift information, and Preg (first cache unit)/Creg (second cache unit).

With continued reference to fig. 5, illustratively, CRTCtrl is a flow scheduling unit for taking charge of CRT flow scheduling and CRT mode parameter variable format adjustment to be compatible with the transmission format of the non-CRT mode (mod= { p, q }, exp= { dp, dq }, msg= { c, qinv }).

With continued reference to fig. 5, an ExpBit, i.e., a power exponent management unit, is illustratively used to manage exp, arrange exp in binary, and shift output left starting from the most significant bit.

With continued reference to FIG. 5, an initialization management unit, illustratively responsible for initialization of the algorithm circuitry, primarily schedules multiple implementations in R-domain modular exponentiation, and then obtains the inverse MINV of mod, and then calculates the variant N of mod _var The initialization process is to obtain N _var Is a process of (2).

The multiplication is performed by multiplying at least two calculation factors to obtain a multiplication result. The application sets two calculation factors respectively, wherein a first factor is set in a first cache unit for calling, and a second factor is set in a second cache unit for calling. Illustratively, for A in the Montgomery expression _i *B _i And Q _i *N _i Then it is considered as the product of splitting a x B and Q x N into N units. Then A _i As a first factor, B _i Is of a second factor, or, Q _i As a first factor, N _i Is a second factor. With continued reference to fig. 5, illustratively, preg is a first cache unit, and Creg is a second cache unit, which may also be used to cache intermediate results of the iterative process during the exponentiation.

Optionally, the modular exponentiation control module further includes a parameter storage unit, where the parameter storage unit is connected to the first buffer unit and the second buffer unit, respectively, and is used to store parameter data such as fixed parameters and operation flow parameters of the encryption processing circuit.

With continued reference to fig. 5, exemplary parameters are Parameter storage units, which store fixed parameters, including radius=64, n=size/radius (Size means RSA 2048/1024), β=2 x, r=β n, and corresponding CRT parameters crt_n=n/2, crt_r=β (n/2).

With continued reference to fig. 5, the memwwrapper is illustratively a parameter configuration table element. The flow scheduling unit in the modular exponentiation control module may be a CRTCtrl module in the figure. Optionally, in the operation process, the configuration or data input is encoded by the host, and is issued to the circuit through the system bus interface, and is subjected to circuit analysis by the Decoder, namely the data analysis unit, and the hardware automatically extracts the configuration or parameters. And outputting a result or a state, storing the result and status address spaces in the MEMWwrapper parameter table, reading and encoding the result or the status address spaces through a Decoder, namely a data analysis unit, and returning the result or the status address spaces to a host through a system bus interface.

With continued reference to fig. 5, an exemplary memwwrapper circuit, including parameter storage and address decoding functions, incorporates 2 blocks of single port RAM with a depth80 x width64 to adapt the radix=64 processing units of the underlying engine to improve the internal access efficiency of CRT and non-CRT parameters.

And address decoding provides two Mapping modes, and for an RSA engine end, 2 blocks of RAMs are allowed to be read and written simultaneously, so that the throughput of parameter reading and writing in the working process is improved. For the host user side, only 1 whole RAM with continuous parameter addresses is available, and the user can continuously write/read all parameters by setting initial offset, and the decoding circuit supports offset self-increment.

Optionally, the parameter configuration module further includes a data analysis unit, where the data analysis unit is connected between the parameter configuration table unit and the host end, and is configured to analyze the configuration parameters sent by the host end to store the configuration parameters in the parameter configuration table unit; and the operation result of the parameter configuration table unit is analyzed to return to the host end.

With continued reference to fig. 5, the Decoder is illustratively a data parsing unit. Optionally, during the operation, the configuration or data input is encoded by the host, and is sent to the circuit through the system bus interface, and is analyzed by the Decoder circuit, and the hardware automatically extracts the configuration or parameters. And outputting a result or a state, storing a result and a status address space in the MEMWwrapper parameter table, reading and encoding the result or the status address space through a Decoder circuit, and returning the result or the status address space to a host through a system bus interface. The system bus interface and the algorithm circuit communicate through the MEM, and both access the MEM. When the configuration or the reading result is issued, the MEM is accessed by the system bus, after the circuit work is completed and started, the MEM use authority is automatically switched to the algorithm circuit access until the circuit operation is finished, and when the final result or the state is required to be output, the MEM authority is switched to the system bus. Note that: MEM access rights are defaulted to the system bus and the circuit only has MEM rights when busy.

for a first factor and a second factor read from the modular exponentiation control module, the multiplier is to perform a multiplication operation of the first factor and the second factor to output a first calculation result to the first adder;

In one embodiment, the first end of the first multiplier MULT may receive the first input signal B _i Or a second input signal N _i A second end of the first multiplier MULT may receive the third input signal A _i Or a fourth input signal Q _i . The first multiplier MULT may output the first calculation result to the first adder ADD1 through the first output terminal after the multiplication operation.

Optionally, the first multiplier MULT is connected to a higher-level multiplication unit CELL through a second output end, so as to output first carry data of the multiplication operation of the present-level multiplication unit CELL to the higher-level multiplication unit CELL.

With continued reference to fig. 2, the first input terminal of the first adder ADD1 may receive the first calculation result, the second input terminal of the first adder ADD1 may receive the first carry data of the lower stage multiplication unit CELL, the first adder ADD1 performs an addition operation on the first calculation result and the first carry data of the lower stage multiplication unit CELL, and the first output terminal of the first adder ADD1 outputs the addition result of the first adder ADD1 as the second calculation result.

With continued reference to fig. 2, the first input end of the second adder ADD2 may input the second calculation result, the second input end of the second adder ADD2 may input the output result of the higher-order multiplication unit CELL, the second adder ADD2 performs an addition operation on the second calculation result and the output result of the higher-order multiplication unit CELL, and the addition result of the second adder ADD2 is output through the first output end of the second adder ADD2 and is output through the calculation result output end of the multiplication unit CELL.

The connection sequence of the multiplier and the adder in the embodiment can support the operation realization function required by the encryption processing process. Alternatively, different data can be input to the multiplier array in different time sequence periods, so that two multiplication operations can be realized by multiplexing one multiplier, and the manner can enable the multiplier to be continuously full of workload in an iteration period, so that the throughput capacity of the multiplier array can be exerted to the limit. Illustratively, the first adder ADD1 and the second adder adopt carry save adders having carry output ends and carry input ends, and respective carry input signals may be obtained from the own carry output signals by taking two beats, that is, inputting the own carry input ends again after two timing periods.

Third embodiment

the storage medium is used for storing a computer program;

The application also provides a processing terminal which comprises the information processing circuit to be processed based on the asymmetric encryption algorithm.

The embodiment provides a high-performance CRT-RSA processing terminal scheme. A high-radix, parallel multiplication array, pipelined, efficient modular multiplication circuit can be implemented on an ASIC based on classical Montgomery modular multiplication principles. And by optimizing parallel scheduling and resource multiplexing, a high-speed modular exponentiation circuit scheme supporting CRT mode and nonCRT mode scene switching is realized.

Referring to fig. 5, in an embodiment, in the technical solution of the processing terminal, the processing terminal is built layer by layer according to modular multiplication, modular exponentiation, CRT-RSA.

The modular exponentiation level is a modular exponentiation level, comprising all modules of a dashed box.

The ModExpCHCl module is a global state control unit and is responsible for modular power scheduling, MEM access and init control.

The CRTCtrl module is responsible for CRT flow scheduling and CRT parameter variable format adjustment to be compatible with the transmission format of the non-CRT (mod= { p, q }, exp= { dp, dq }, msg= { c, qinv }).

The Parameter, i.e. Parameter storage unit, stores fixed parameters including radio=64, n=size/radio (Size means RSA 2048/1024), β=2 radio, r=β ζ, and the corresponding CRT parameters crt_n=n/2, crt_r=β ζ (n/2).

ExpBIT is a power exponent management unit, which is used for managing exp, arranging exp according to binary system, and outputting by left shift from most significant bit.

An initialization management unit, which is responsible for initializing the algorithm circuit, mainly for dispatching multiple to realize modular exponentiation in R domain and then to obtain the inverse M of mod _INV Further, variant N of mod was calculated _var The initialization process is to obtain N _var Is a process of (2).

The Preg, the first buffer unit, and the Creg, the second buffer unit, are used for buffering intermediate results of the iterative process during exponentiation.

Management, the multiplication array input Management unit, manages the A and B inputs of the bottom level multiple based on the control information from ModExpCHCl, expBIT shift information and P/Creg cache values.

Application of modular exponentiation layer takes parameters mod, exp, msg of system bus configuration in MEMWwrapper as input variables, and executes msgexpmod(mod) modular exponentiation, the whole scheme is RSA.

The CRT-RSA is an acceleration scheme based on RSA circuit, and parallel exponentiation is realized by parallel layout of [0] [1], and the Share multiple module is also a step-by-step [0] [1] parallel architecture, as will be described later.

The parameter configuration table is two RAMs, two different Mapping modes are provided for the host side and the engine side, and efficient parameter configuration and acquisition can be realized. Supporting system bus access interfaces. It should be noted that, preg and Creg are intermediate temporary results for buffering modular exponentiation, and RAM refers to that the parameter configuration table MEMWwrapper is internally composed of two pieces of RAM.

The configuration or data input is encoded by the host end, issued to the circuit through the system bus interface, analyzed by the Decoder circuit, and the hardware automatically extracts the configuration or parameters. And outputting a result or a state, storing a result and a status address space in the MEMWwrapper parameter table, reading and encoding the result or the status address space through a Decoder circuit, and returning the result or the status address space to a host through a system bus interface.

The processing procedure of the scheme is divided into two stages of initialization and operation, and the operation can be executed only when the initialization is finished first. The circulation process is as follows:

In the initialization stage, system Bus issues init request and parameters, which are received and analyzed by Decoder, then brushes into MEM parameter table, modExpC trl receives initialization request, state machine starts working and starts initialization, initial, i.e. initialization management unit firstly reads Initial variable parameters from MEM, then allocates ExpBIT (assignment R/2-1), preg (assignment 1) and Creg (assignment M), and calls ShareMultipleer to make power operation, the iterative intermediate result is buffered in Preg (assignment temp), and the power iteration finally obtains mod inverse M _INV Buffer memory in Preg (assigned M'), readjustmentVariant N of modulo mod can be calculated with an underlying multiplier _var =M* (M’modBeta), which is saved for Montgomery modular multiplication (i.e., in the mont mode, variant N) _var As N in input) to the completion of this initialization.

In the operation stage, a System Bus issues a calc request and parameters, the calc request and the parameters are received and analyzed by a Decoder, a MEM parameter table is brushed, a ModExpCHCl receives the operation request, a state machine starts to work, an operation flow is started, and the ModExpCHL firstly reads operation variable parameters from MEM.

First, montgomery preprocessing is performed, preg (assigned R2) and Creg (assigned msg) are allocated, and Share multiple is called to perform mont-mult (modular multiplication) operation (i.e. S=A×B×R) ^-1 ) The calculation is cached in Creg (assignment C. Times.R) while Preg is refreshed (assignment 1*R). Then, performing modular exponentiation square multiplication iteration, allocating ExpBIT (assignment E or Dp, dq) shift to output iteration indexes, continuously calling a Share multiple to perform mont-mut (modular multiplication) operation, caching an intermediate result of the iteration in Preg (assignment temp P), keeping Creg unchanged (assignment C x R), shifting the iteration indexes of the corresponding ExpBIT to the left by 1bit every time the modular exponentiation square multiplication is completed, and ending the modular exponentiation square multiplication iteration until the iteration indexes are all finished. Finally, performing Montgomery post-processing, allocating Preg (value temp P) and Creg (value 1), and calling Share multiple to perform mont-mult operation (i.e. S=A×B×R ^-1 ) The calculation result is cached in Preg (assigned temp P).

In the case of the noccrt mode, the result is the end result of RSA, which will be flushed to the result address space of the memwwrapper. Illustratively, when e is determined bitwise from the exponent _i And 0, then p=p x P, calling a primary modular multiplication; when e _i P=p×p and p=p×c, and two modular multiplications are invoked. And e _i Is 0 or 1, and is probabilistically equal, the estimated average number of calls is 1.5 times the exp power exponent bit width.

It will be appreciated that the non-CRT number of cycles=wexp 1.5 (2 x Wexp/Radix) where Wexp refers to the exp exponent bit width, such as wexp=1024 for RSA1024, such as wexp=512 for CRT-RSA 1024.

In the case of CRT mode, the CRT estimated average performance is based on the nonCRT performance, such as RSA1024, and when CRT-RSA1024 is disassembled into two RSA512 to operate simultaneously in parallel, the Montgomery modular exponentiation operation time 512 is considered to be 1/2 of 1024, and the modular exponentiation width 512 is also considered to be 1/2 of 1024, so the total operation time 512 is 1/4 of 1024. I.e. CRT can achieve a 4-fold acceleration compared to a noccrt.

CRT number of operation cycles= (0.5×wexp) 1.5× (2×0.5×wexp)/radius) =0.25×no CRT number of operation cycles. The evaluation can be performed by multiplying the number of the operation cycles of the noccrt by 0.25. The result is only the intermediate result m1 (=c≡dp)modp) and m2 (=c≡dq)modq), then the CRTCtrl module calculates the intermediate variables (m 1-m 2) by scheduling the underlying modular subtracting circuitmodp, then the CRTCtrl module applies two Montgomery modular multiplication schedules to the ModExpcrl module, performs preprocessing for the first time, prepares Preg (assigned R2) and Creg (assigned m1-m 2), buffers the calculated result in Preg (assigned m1-m 2) R, performs majority modular multiplication for the second time, keeps Preg unchanged (assigned m1-m 2) R, prepares Creg (assigned qinv), and calculates h= (qinv (m 1-m 2)) modp, cached in Preg (assignment h). And then the CRTCtrl module applies primary large-number multiplication scheduling to the ModExpcrl module, prepares Preg (assigned h) and Creg (assigned q), calculates to obtain h.q, and then schedules an adder to calculate m=m2+h.q, wherein the result is the final result of CRT-RSA and is refreshed to a result address space of MEMWwrapper.

The host may obtain the results and status in the MEMWwrapper via the system bus, which completes the operation phase.

As shown in fig. 6, the host side issues init (initialization packet) and calc (operation packet) instruction code packets, and in the initialization stage, the host side can initialize the RSA circuit (preprocess variant mode value N _var ). In the operation stage, the host side dispatches the RSA circuit to execute modular exponentiation (encryption or decryption) by issuing a calc code packet. The calc code packet is a code format of port input, and all data packets must be divided into in according to the code formatit and calc are distinguished by a flag signal.

The init coding packet and the calc coding packet are composed of two parts, namely a header and payload.

Wherein the init header (header of the initialization packet) carries a mode, crt_en (CRT enable) alone for initializing configuration Size (RSA mode) and CRT enable selection.

Each Header carries an offset for indexing specific parameter positions of RSA MEM (parameter configuration table). The carried launch (start bit) is an RSA start button, if the current packet is started, the current packet is set to be 1, if the split multi-packet is issued, the front packet is set to be 0, and finally the current packet is set to be 1.

The Payload is a parameter variable corresponding to the offset index position, wherein the init Payload (variable data of the initialization packet) carries initialization variables including mod, exp or p, q, dp, dq, qinv information, and the calc Payload (variable data of the operation packet) carries operation variables including msg, result, status.

It is understood that RSA is in fact a large digital-to-analog power operation. Such as modular exponentiation msg expmodmod, when the original text message is regarded as msg, the private key d or the public key e is regarded as exp, the prime product n is regarded as mod, and the modular exponentiation operation is performed, namely RSA cryptographic algorithm operation.

CRT-RSA is an optimization acceleration for RSA, only used when RSA performs private key d operation. RSA parameters mod, exp and CRT-RSA parameters p, q, dp, dq, qinv are public key-private key information which can be known in advance, and msg and result are data information temporarily acquired when in use. The scheme performs operation on known information in advance, namely init, that is, a large number of determined calculation steps are completed in advance, and when the msg information actually required to be processed is received, the rest steps can be rapidly executed based on init results, namely calc. The init and clac stages are split, so that the steps and performances of msg real-time operation are greatly optimized.

The MEMWwrapper circuit comprises a parameter storage function and an address decoding function, and a single-port RAM with the capacity of 2 blocks of depth80 and width64 is built in the MEMWwrapper circuit so as to adapt to a radio=64 processing unit of an underlying engine, and improve the internal access efficiency of CRT and non-CRT parameters.

And address decoding provides two Mapping modes, and for an RSA engine end, 2 blocks of RAMs are allowed to be read and written simultaneously, so that the throughput of parameter reading and writing in the working process is improved. For the host user side, only 1 whole RAM with continuous parameter addresses is available, and the user can continuously write/read all parameters by setting initial offset, and the decoding circuit supports offset self-increment. Two different Mapping modes are shown in tables 1 (a) and 1 (b) below.

TABLE 1 (a) hardware Engine side Access parameter Mapping Table

TABLE 1 (b) host user access parameter Mapping Table

The first table 1 (a) shows a mapping scheme visible to the user, and the parameter variables of the noccrt and CRT are slightly different. mod corresponds to p, q, exp corresponds to dp, dq, qinv being unique to CRT. msg and result have the same meaning.

Referring to FIGS. 2-5, for a Share multiplexer circuit, the bottom layer is composed of a set of reusable Multiplier arrays based on the parallel modular multiplication implemented by the Montgomery algorithm.

In this scheme, the Montgomery expression step c (see background principles for details and Montgomery modular multiplication expression) is further disassembled into specific implementation of ASIC:

firstly, a basic value radius is set, the bit width of a unit is represented, and a main variable N is obtained _var A, B and S are split into N CELLs CELL according to radio, i.e. [ N ] _n-1 ,…N _i ,…N ₀ ]，[A _n-1 ,…A _i ,…A ₀ ]，[B _n-1 ,…B _i ,…B ₀ ]And [ S ] _n-1 ,…S _i ,…S ₀ ]Where n=size/radius, size refers to the modulo bit width of RSA (typically 1024/2048/4096). Wherein please refer toExamination chart 5,N _var =n_in, a=a_in, b=b_in, and S is the intermediate result naming implemented inside the multiplexer circuit, S can be split into packet arrays S _n-1 ,…S _i ,…S ₀ ]S is iterated a number of times and finally output as s_out of fig. 5.

Then, for Q _i = SmodBeta can be realized by taking the LSB Radix bit of S at each iteration, namely taking the unit S ₀ Is a function of the iteration value of (a).

For A _i * B and Q _i *N _var Then consider A.times.B and Q.times.N _var Splitting into N units of products, and iteratively accumulating by parallel importing split N according to multiplication array of multiplication units (CELL) _var And B value, leading A and Q into the CELL unit cross serial, and simultaneously executing multiplication by all units, i.e. simplifying to solve A _i *B _i And Q _i *N _i The problem is that the product of the CELL CELL (i) of the present stage should be added to the carry of the CELL CELL (i-1) of the next stage, i.e. A of a single CELL, taking into account the multiplication carry _i *B _i Should be implemented as LSB (A) _i *B _i )+MSB(A _i-1 *B _i-1 ) Same as Q _i *N _i Should be implemented as LSB (Q) _i *N _i )+MSB(Q _i-1 *N _i-1 )。

For S/beta sum (Q x N) _var ) Since β=2radio, this division indicates that all CELLs CELL of S are shifted by radio to the right, just equivalent to carry from the i+1st to the i-th stage, i.e. the S/β implementation is reduced to a single CELL S _i+1 Down stage S _i Carry, (Q.times.N) _var ) The beta implementation is reduced to a single unit (Q _i+1 *N _i+1 ) Down stage (Q) _i *N _i ) Carry;

finally, according to A ₀ ,…,A _n-1 Sequentially introducing all A _i (when implemented, Q _i Without introduction, Q at a time _i Equal to the current S ₀ ) The output of all the CELLs CELL [ S ] _n-1 ,…S _i ,…S ₀ ]I.e. Montgomery modular multiplication S=A.times.B.times.R.times.1modN。

The port and circuit design of CELL is shown in fig. 2. Because Montgomery formula c.2 requires two multiplication operations to be performed and then summed, the underlying multiplication array CELL unit circuit only instantiates 1 multiplier and 2 carry save adders in order to reduce resource overhead. Illustratively, in the embodiment of fig. 2, which is a circuit configuration of two multiplication units CELL of the bottom CELL array, such CELLs total n=2048/64=32, because RSA2048 is implemented at the highest and the array is grouped by radio=64.

The multiplier adopts the optimal area of the standard library and the time sequence MULTI. The adder adopts 2-beat carry preservation and is just used as the next pen A _i The carry input of (2) perfectly solves the carry cascade problem of the large number addition.

Illustratively, the first clock period even is an even period and the second clock period odd is an odd period. B and N _var The parallel inputs are spread according to the multiplication unit CELL and remain unchanged throughout the iteration. A and Q are alternately input: even period (i.e., even period) input A _i The odd period (i.e., odd period) inputs Q _i This is done to multiplex 1 multiplier to implement two multiplication operations, and each mult is continuously filled in an iteration cycle, which can bring the throughput capability of the multiplier array to a limit.

It will be appreciated that at a given frequency, the maximum delay time of the logic circuit within a clock cycle needs to meet the basic requirements of setup time and hold time, which may otherwise cause delay problems in the circuit design. It will be appreciated that the logic complexity of the circuit tends to limit the clock frequency, and that with a reasonable logic design, the circuit can have higher frequency capabilities, as well as higher performance.

Referring to fig. 2-5, the scheme of the processing terminal is designed into a pipeline structure, so that the timing problem of the large-bit-width multiplier can be effectively relieved, and the carry save adder can solve the problem of serial logic of large-number addition carry, so that high-base Radix becomes possible, therefore, the high-base scheme of RSA2048 (downward compatible with RSA 1024) is realized by radix=64, and the working frequency realized on the ASIC exceeds 600MHZ. The benefit of the pipeline design is that the bit width of the base unit is increased, the input period overhead is reduced, the working frequency is also improved, and the operation efficiency is greatly improved.

Illustratively, with this circuit implementation employed in this embodiment, after each multiplication calculation is completed, the bottom layer may have a portion of the non-0 carry result saved in a register, referred to as a residue, and the circuit may be used for a final processing. As shown in FIG. 7, the present array scheme is according to A [0,1, …, n-1 ]]The input is iterated when the final A _n-1 After input, to avoid possible carry-over residue of the adder in CELL, a complementary process is performed to determine whether all carry-over registers in CELL are 0, if so, no carry-over residue is indicated, and S [0,1, …, n-1 is output]The value is enough, otherwise, one more iteration (nth round) is performed, A can be obtained _n ,Q _n+1 B, N are all input with a value of 0 and the adder carry save value for each stage CELL is shifted left to the previous stage carry save register. Here, the carry save register refers to a register that saves the information of carry co of ADD1 and ADD2, and corresponds to two small squares beside the first adder ADD1 and the second adder ADD2 in fig. 2, as an illustration. When this additional nth iteration is completed, it is continued to determine whether all CELL carry bits remain, and then the previously described operational flow is repeated. Experiments prove that the method can be used for processing cleanly by one supplementary iteration, and the probability of needing multiple supplementary iterations is very small.

This embodiment designs an ASIC implementation of RSA2048 (downward compatible RSA 1024), with maximum size=2048, radius=64, n=size/radius=32 from a resource perspective, that is, the number of CELL units of the modulo multiplication array is 32, and a single CELL processing bit is 64 bits wide (i.e., 1×multiple-64 and 2×add-64 are invoked). It is proposed herein to support both the noccrt and CRT modes, so the modular array makes two deployments: when operating in the non-CRT mode, typically referred to as public key operation, 32 sets of CELL CELLs are cascaded in sequence, corresponding to a set of 2048-bit latch registers B and N, and a set of 2048-bit shift registers A, all valid bits are dependent on size=1024 or 2048, e.g., RSA2048 occupies resources CELL [1] [0] and A, B, N [1] [0], and RSA1024 occupies only CELL [0] and A, B, N [0]. When operating in CRT mode, generally referred to as private key operation, all CELL are split into 2 groups of independent parallel, with 16 CELL units within each group cascading in turn, corresponding to two respective 1024-bit latch registers B [0] [1] and N [0] [1], and two 1024-bit shift registers A [0] [1]. CRT multiplexes the same resource as nonCRT, divides large modular multiplication into two groups of small modular multiplication with halving scale, and can provide parallel modular multiplication for modular exponentiation of upper layer, up to 4 times acceleration.

Fig. 8 is a schematic diagram of a noccrt modular array of the embodiment of fig. 2 of the present application. Fig. 9 is a schematic diagram of a CRT modular array of the embodiment of fig. 2 of the present application.

Referring to fig. 8 and 9, in this embodiment, both the nocrat and CRT deployments are modular and array compatible. The uppermost layer is RSA dispatch, as shown in FIG. 5, mainly to implement initialization, modular exponentiation, and parameter access. As mentioned above, the RSA of the present scheme is divided into two phases: an initialization stage and an operation stage. An initialization stage for pre-processing the variant modulus value N _var =N * (N’modBeta), N is a known initial modulus, beta=2radio is also a known value, and the key is to solve for N'.

It should be noted that, the bottom multiplication array is shown in fig. 2, and is the main resource, so high multiplexing is required. It can be used as Montgomery modular multiplication or as large number multiplication. Implementation of Montgomery modular multiplication requires initial calculation of the modular value variant N _var . Then it may be multiplexing the underlying multiplication array for N _var The multiplication is performed as a large number. When initialization is complete, it indicates that Montgomery modular multiplication is already available. Entering the operation stage, the bottom multiplication array can be switched to Montgomery modular multiplication to provide modular multiplication operation.

The underlying multiplication array shown in fig. 2, not only implements montgomery modular multiplication, it is highly multiplexed, almost all the required multiplication operations are scheduled to be implemented, such as initialization and CRT-related large number multiplications, or Ai-like B-partial multiplications, where multiple internal variables are involved.

Montgomery's principle is described as (-N x N')modR=1, where r=β ζ/radius, it can be seen that r=2ζ, mathematically modulo-inverted in the quadratic domain, can be converted into a modulo-exponentiation, i.e. solving for N' =r-N (R/2-1)modR is defined as the formula. Modulo 2 Size is equivalent to taking the lower Size bits, and re-writing to N' =R-N (R/2-1) [ Size-1:0]. Thus initializing solution N _var The method is optimized to solve the exponentiation operation, and the exponentiation operation can be easily realized by the multiplication operation according to an L-R binary expansion method. As used above, primary resource multiplexing refers to multiplexing of the multiplier array by the nonCRT and CRT modes (multiplexing of different computational processes), where the initialization solves for N _var The reduction to large number multiplication is also a re-multiplexing of the multiplier array (multiplexing in a single computation). Modifying the logical cascade relationship and the usage of the modular multiplication CELL unit array, the large number multiplication operation can be realized, as shown in fig. 10.

In RSA operation stage, modular exponentiation is performed, and modular exponentiation is converted into modular multiplication according to the L-R binary expansion method.

Because the underlying modular multiplication circuit is based on Montgomery algorithm, only S=A.B.R.sup. -1 can be realized modN, to implement modular exponentiation P=C≡modN, in addition to the square sum multiplication, requires a dispatch modular multiplication circuit for pre-and post-processing, as shown in table 2.

It should be noted that the preprocessing is to multiply the initial input by the R value and convert the initial input into the montgomery representation. The result SR obtained by the power exponent iteration of the L-R binary expansion method is still in the Montgomery expression. The post-processing is to eliminate the R value from the SR result to obtain the final target result S.

TABLE 2 Montgomery modular exponentiation processing steps

For CRT and nocamong, key representationThe formulae are different: a non-CRT private key (n, d), a CRT private key (p, q, dp, dq, qinv). In the initialization and operation stages, the two execution parameters and flows are slightly different. non-CRT initialization phase solution N _var The (e.g., 1024-bit) operation phase solves for p=c≡dmodn. Whereas the CRT initialization phase solves for P of 512 bits, for example _var And e.g. another 512-bit Q _var The calculation stage solves m1=c≡dpmodp, m2=c^dqmodq, h=qinv*(m1-m2)modp, p=m2+h q, wherein m1, m2 and h, p are calculated by multiplexing the multiplication array, and m1 and m2 which are time-consuming in main calculation are solved, so that completely independent parallel execution is realized.

In the aspect of private key operation performance, the CRT-RSA realizes 4 times of acceleration compared with the nonCRT. Performance evaluation is shown in table 3. The present solution aims to implement a performance maximization and area minimization CRT-RSA circuit based on ASIC.

Table 3 New technical scheme Performance evaluation (private Key operation)

Fourth embodiment

The method adopts high-base parallel running type modular multiplication, optimizes CRT parallel and multiplexing, can obviously improve the operation efficiency of an RSA circuit, and optimizes resources. The multiplication array with the pipeline structure is adopted, so that the size of a substrate and the working frequency are increased, and the modular multiplication operation efficiency is remarkably improved. And then the global operation logic is deployed in parallel, the bottom layer resources are highly multiplexed, the host configuration mode is optimized, and the high-performance CRT-RSA circuit is realized on the ASIC.

In this application, step numbers such as S100 and S200 are used for the purpose of more clearly and briefly describing the corresponding content, and are not to constitute a substantial limitation on the sequence, and those skilled in the art may execute S200 first and then S100 when implementing the present invention, but these are all within the scope of protection of the present application.

The embodiments of the system and the storage medium provided in the present application may include all the technical features of any one of the embodiments of the method, where the expansion and explanation of the description are substantially the same as those of each embodiment of the method, and are not repeated herein.

The present embodiments also provide a computer program product comprising computer program code which, when run on a computer, causes the computer to perform the method in the various possible implementations as above.

The embodiments also provide a chip including a memory for storing a computer program and a processor for calling and running the computer program from the memory, so that a device on which the chip is mounted performs the method in the above possible embodiments.

It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided in the embodiments of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device of the embodiment of the application can be combined, divided and pruned according to actual needs.

In this application, the same or similar term concept, technical solution, and/or application scenario description will generally be described in detail only when first appearing, and when repeated later, for brevity, will not generally be repeated, and when understanding the content of the technical solution of the present application, etc., reference may be made to the previous related detailed description thereof for the same or similar term concept, technical solution, and/or application scenario description, etc., which are not described in detail later.

In this application, the descriptions of the embodiments are focused on, and the details or descriptions of one embodiment may be found in the related descriptions of other embodiments.

The technical features of the technical solutions of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the present application.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. An encryption processing method, comprising:

decrypting or signing the information to be processed according to the operation result;

the multiplication array comprises a plurality of multiplication units which are sequentially cascaded, the Montgomery modular multiplication expression is based, at least two multiplication arrays are called, and the step of calculating the modular multiplication operation comprises the following steps:

in response to obtaining a preset base value representing the CELL bit width of the multiplication CELL, splitting a calculation variable in the modular multiplication operation according to the preset base value, and inputting data of different stages into each stage of multiplication CELL in the cascade multiplication CELLs so as to respectively and correspondingly input the data into each multiplication CELL for iterative operation.

2. The encryption processing method according to claim 1, wherein the key includes a noccrt key; the step of invoking at least two multiplication arrays based on Montgomery modular multiplication expressions to calculate the modular multiplication operation comprises the following steps:

3. The encryption processing method according to claim 1, wherein the key includes a CRT key; the step of invoking at least two multiplication arrays based on Montgomery modular multiplication expressions to calculate the modular multiplication operation comprises the following steps:

4. The encryption processing method according to claim 1, wherein the step of splitting the calculation variables in the modular multiplication operation according to the preset base values to be respectively and correspondingly input to each multiplication unit for iterative operation in response to obtaining the preset base values representing the unit bit widths of the multiplication units comprises:

for the operation of the complementary module in the Montgomery modular multiplication expression, in each iterative calculation of the multiplication array, the iterative value of the output result of the least-significant multiplication unit is taken to realize;

and/or the number of the groups of groups,

for multiplication in the Montgomery modular multiplication expression, parallel importing split calculation variables according to a multiplication array of a plurality of multiplication units which are sequentially cascaded, so that the multiplication units of the multiplication array execute multiplication at the same time; and based on multiplication calculation of the plurality of multiplication units, adding the product of each stage of multiplication unit with the carry of the lower stage of multiplication unit to realize the multiplication operation;

And/or the number of the groups of groups,

for a division operation, in each iterative calculation of the multiplication array, the output result of each multiplication unit carries to a lower-stage multiplication unit to realize the division operation.

5. The encryption processing method according to claim 1, wherein the step of splitting the calculation variable in the modular multiplication operation according to the preset base value to respectively correspond to the input to each multiplication unit for iterative operation in response to obtaining the preset base value representing the unit bit width of the multiplication unit, and then comprises:

6. The encryption processing method according to any one of claims 1 to 5, wherein the step of calling at least two multiplication arrays based on the montgomery modular multiplication expression to calculate the modular multiplication operation includes:

7. An encryption processing circuit is characterized by comprising a modular multiplication module, a modular exponentiation control module and a parameter configuration module;

the parameter configuration module is used for generating the configuration parameters based on the calculation requirement of an asymmetric encryption algorithm and the Montgomery modular multiplication expression, and receiving the modular exponentiation operation result;

the multiplication array comprises a plurality of multiplication units which are sequentially cascaded, the encryption processing circuit obtains a preset base value representing the unit bit width of the multiplication units, the calculated variable in the modular multiplication operation is split according to the preset base value, and in the cascaded multiplication units, each stage of multiplication unit CELL inputs data of different stages so as to respectively and correspondingly input the data to each multiplication unit for iterative operation.

8. The encryption processing circuit of claim 7, wherein the modular multiplication module comprises at least two multiplication arrays;

9. The encryption processing circuit of claim 8, wherein the modular exponentiation control module comprises a global state control unit coupled to the at least two multiplication arrays for modular exponentiation scheduling, parameter reading and writing, and initialization process control of the at least two multiplication arrays.

10. The encryption processing circuit of claim 9, wherein the modular exponentiation control module further comprises a multiplication array input management unit coupled to the at least two multiplication arrays for managing factor inputs of the multiplication arrays;

And/or the number of the groups of groups,

the modular exponentiation control module further comprises a flow scheduling unit, wherein the flow scheduling unit is connected with the global state control unit and is used for adjusting the configuration parameter format and performing operation flow scheduling;

and/or the number of the groups of groups,

the modular exponentiation control module further comprises an exponentiation management unit, wherein the exponentiation management unit is connected with the multiplication array input management unit and is used for arranging exponentiation according to binary system so as to output the exponentiation in a bitwise sequence in the operation process;

and/or the number of the groups of groups,

the modular exponentiation control module further comprises an initialization management unit, wherein the initialization management unit is connected between the global state control unit and the at least two multiplication arrays and is used for scheduling the at least two multiplication arrays to perform initialization operation so as to obtain a variant of a modular parameter;

and/or the number of the groups of groups,

the modular exponentiation control module further comprises a first buffer unit and a second buffer unit, wherein the first buffer unit and the second buffer unit are respectively connected with the global state control unit, the first buffer unit is used for buffering a first factor of the multiplication array operation, and the second buffer unit is used for buffering a second factor of the multiplication array operation;

And/or the number of the groups of groups,

the modular exponentiation control module further comprises a parameter storage unit, wherein the parameter storage unit is respectively connected with the first buffer unit and the second buffer unit and is used for storing fixed parameters and operation flow parameters of the encryption processing circuit.

11. The encryption processing circuit according to claim 7, wherein the parameter configuration module includes a parameter configuration table unit, and the parameter configuration table unit is connected to the modular exponentiation control module, and is configured to receive and store configuration parameters sent by a host side for reading by the modular exponentiation control module; and receiving and storing the operation result of the modular exponentiation control module for the host side to call.

12. The encryption processing circuit of claim 11, wherein the parameter configuration table unit comprises at least two blocks of memory that can be read and written simultaneously by the modular exponentiation control module in a first memory management mode; in the second storage management mode, the at least two blocks of memories are encoded into memories with continuous parameter addresses for the host side to sequentially read and write.

13. The encryption processing circuit according to any one of claims 7 to 12, wherein the multiplication array comprises a plurality of multiplication units in cascade in order, each multiplication unit comprising a multiplier, a first adder, and a second adder connected in order;

14. A processing terminal comprising an interconnected processor and storage medium, wherein:

the storage medium is used for storing a computer program;

the processor is configured to read the computer program and execute the computer program to implement the encryption processing method according to any one of claims 1 to 6;

and/or the number of the groups of groups,

the processing terminal comprising an encryption processing circuit according to any one of claims 7-13.

15. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the encryption processing method according to any one of claims 1-6.