CN117234458B

CN117234458B - Multiplication array, data processing method, processing terminal and storage medium

Info

Publication number: CN117234458B
Application number: CN202311485484.9A
Authority: CN
Inventors: 任培培; 郭超
Original assignee: Shenzhen Dapu Microelectronics Co Ltd
Current assignee: Shenzhen Dapu Microelectronics Co Ltd
Priority date: 2023-11-09
Filing date: 2023-11-09
Publication date: 2024-02-23
Anticipated expiration: 2043-11-09
Also published as: CN117234458A

Abstract

The application provides a multiplication array, a data processing method, a processing terminal and a storage medium, wherein the multiplication array comprises a plurality of multiplication units which are sequentially cascaded; the multiplication unit comprises a first multiplier, a first adder and a second adder, wherein the first output end of the first multiplier is connected with the first input end of the first adder; the second input end of the first adder is used for receiving first carry data, and the output end of the first adder is connected with the first input end of the second adder; and a second input end of the second adder receives the output result of the higher-stage multiplication unit so that the second adder outputs the output result of the current-stage multiplication unit. According to the method and the device, one multiplier is multiplexed in each multiplication unit to realize two multiplication operations, the throughput capacity of the multiplier array can be exerted to the limit, and a plurality of multiplication units can adopt a pipeline structure, so that the working frequency is improved, and the operation efficiency is greatly improved.

Description

Multiplication array, data processing method, processing terminal and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data multiplication array, a processing method, a processing terminal, and a storage medium.

Background

RSA (asymmetric encryption algorithm) is the most widely used public key encryption system at present, and is mainly used for performing modular exponentiations of large integers, typically such as RSA1024 and RSA2048, and along with the increase of security requirements, RSA4096 is also more and more common. The cracking difficulty of the large-integer RSA is very high, but the calculation amount of encryption/decryption is also high, the calculation time is long, and the calculation efficiency becomes the bottleneck for restricting the RSA application. Therefore, there is great interest in finding efficient implementation techniques for large integer modular exponentiations.

In theory of algorithm, the complexity of RSA algorithm implementation is gradually optimized and degraded, from modular exponentiation, modular multiplication, multiplication/addition/shift, the implementation of hardware is friendly, but specific implementation circuits are required to balance resource and efficiency problems, the parallelism of most schemes is not high, or parallel units are small, the clock frequency is low, and the implementation is not perfect in aspects of resource multiplexing and parallel scheduling.

Disclosure of Invention

In order to alleviate the above problems, the present application provides a multiplication array of an encryption processing circuit, including a plurality of multiplication units that are sequentially cascaded, where the multiplication units are used to perform iterative operation on a computation variable in modular multiplication operation;

The multiplication unit comprises a first multiplier, a first adder and a second adder, wherein the first multiplier is used for carrying out multiplication operation on a pair of variable multipliers, a first input end of the first multiplier is used for receiving a first variable multiplier in the pair of variable multipliers, a second input end of the first multiplier is used for receiving a second variable multiplier in the pair of variable multipliers, and a first output end of the first multiplier is connected with a first input end of the first adder so as to output a first calculation result to the first adder;

the second input end of the first adder is used for receiving first carry data of multiplication operation of a lower-stage multiplication unit, so that the first adder performs first addition operation on the first calculation result and the first carry data, and the output end of the first adder is connected with the first input end of the second adder so as to output a second calculation result to the second adder;

the second input end of the second adder is used for receiving the output result of the higher-level multiplication unit, so that the second adder performs a second addition operation on the second calculation result and the output result of the higher-level multiplication unit to output the output result of the current-level multiplication unit.

Optionally, the first multiplier is connected to a higher-order multiplication unit through a second output end, so as to output the first carry data of the multiplication operation of the current-order multiplication unit to the higher-order multiplication unit.

Optionally, the multiplication array of the encryption processing circuit further includes a first selector, the first selector is connected in series between an output end of the first adder and a first input end of the second adder, the first input end of the first selector is connected to the output end of the first adder, the second input end of the first selector receives a second calculation result of a higher-order multiplication unit, and the output end of the first selector is connected to the first input end of the second adder, so that the second adder performs the second addition operation on the second calculation result of the higher-order multiplication unit.

Optionally, the multiplication array of the encryption processing circuit further comprises a second selector connected in series between a second input and an output of the second adder;

the first input end of the second selector receives the output result of the higher-stage multiplication unit, the second input end of the second selector is connected with the output end of the second adder, and the output end of the second selector is connected with the second input end of the second adder so that the second adder performs the second addition operation on the output result of the present-stage multiplication unit.

Optionally, the multiplication array of the encryption processing circuit further comprises a third selector, the third selector is connected in series with the output end of the second adder, the first input end of the third selector is connected with the output end of the second adder, the second input end of the third selector is set to zero, and the output end of the third selector is connected with a low-stage multiplication unit.

Optionally, the second adder performs a second addition operation on the second calculation result and the output result of the higher-order multiplication unit at the first timing, and performs a second addition operation on the output result of the present-order multiplication unit and the second calculation result of the higher-order multiplication unit at the second timing.

Optionally, the multiplication array of the encryption processing circuit further comprises a first register;

the first register is connected in series with the first input end of the first multiplier so as to receive and temporarily store the first variable multiplier;

and/or the number of the groups of groups,

the multiplication array of the encryption processing circuit further comprises a second register;

the second register is connected in series with the second input end of the first multiplier so as to receive and temporarily store the second variable multiplier;

and/or the number of the groups of groups,

the multiplication array of the encryption processing circuit further comprises a third register;

The third register is connected in series between the first input end of the first adder and the first output end of the first multiplier so as to receive and temporarily store the first calculation result;

and/or the number of the groups of groups,

the multiplication array of the encryption processing circuit further comprises a fourth register; the fourth register is connected in series with the second input end of the first adder so as to receive and temporarily store the first carry data of the multiplication operation of the low-level multiplication unit;

and/or the number of the groups of groups,

the multiplication array of the encryption processing circuit further comprises a fifth register, wherein the fifth register is connected in series with the output end of the second adder so as to receive and temporarily store the output result;

and/or the number of the groups of groups,

the multiplication array of the encryption processing circuit further comprises a sixth register which is connected in series between the output end of the second adder and the second input end of the second adder so as to receive and temporarily store the output result.

Optionally, the first adder is a carry save adder, and the first adder includes two stages of registers connected in series, where the two stages of registers are used to receive and temporarily store the second carry data of the first adding operation and latch for two clock cycles;

And/or the second adder is a carry save adder, and the second adder comprises two stages of registers connected in series, and the two stages of registers connected in series are used for receiving and temporarily storing third carry data of the second addition operation and latching two clock cycles.

The application also provides a data processing method applied to the multiplication array of the encryption processing circuit to realize the modular multiplication operation of the Montgomery expression, comprising the following steps:

setting a preset base value, wherein the preset base value is used for representing the unit bit width of a multiplication unit;

splitting a calculated variable of the modular multiplication operation corresponding to the Montgomery expression according to the preset basic value so as to correspond to the unit bit width of each multiplication unit in the multiplication array;

and importing the split calculated variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel and realize the modular multiplication operation.

Optionally, the modular multiplication of the montgomery representation includes a modulo operation; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

And in the process of carrying out the operation of the residual modulus, in each iterative calculation of the multiplication array, the iterative value of the output result of the least significant multiplication unit is taken to realize.

Optionally, the modular multiplication of the Montgomery expressions includes a multiplication operation; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

in the process of carrying out the multiplication operation, arranging a multiplication array according to a plurality of multiplication units which are sequentially cascaded, and executing multiplication calculation in parallel;

based on the multiplication calculation of the plurality of multiplication units, the product of each stage of multiplication unit is added with the carry of the lower stage of multiplication unit to realize the multiplication operation.

Optionally, the modular multiplication of the Montgomery expression includes a division operation; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

in the process of carrying out the division operation, in each iterative calculation of the multiplication array, the output result of each multiplication unit carries to a lower stage multiplication unit so as to realize the division operation.

Optionally, the adder of each multiplication unit comprises a carry register; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

if the current stored data of the carry register of at least one multiplication unit is not zero, adding at least one iteration operation to the multiplication array so as to make the stored data of the carry register of each multiplication unit all be zero.

Optionally, the adding at least one iterative operation to the multiplication array includes:

setting the calculated variable to 0 to input each multiplication unit of the multiplication array for operation;

and shifting left the data saved in the carry register of each stage of multiplication unit adder to the higher stage of carry register.

The application also provides a processing terminal comprising a multiplication array of the encryption processing circuit;

and/or the number of the groups of groups,

the processing terminal comprises an interconnected processor and a storage medium, wherein:

the storage medium is used for storing a computer program;

the processor is configured to read the computer program and execute the computer program to implement the data processing method as described above.

The present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method as described above.

According to the multiplication array, the data processing method, the processing terminal and the storage medium, one multiplier is multiplexed in each multiplication unit to realize two multiplication operations, and throughput capacity of the multiplier array can be brought into a limit. The multiple multiplication units designed by the scheme adopt a pipeline structure, so that the time delay problem of the large-bit-width multiplier can be effectively relieved, the working frequency is improved, and the operation efficiency is greatly improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a block diagram of a multiplication array according to one embodiment of the present application.

Fig. 2 is a schematic diagram of a cascade of multiplication units according to an embodiment of the present application.

Fig. 3 is a signal input/output diagram of a multiplication unit according to an embodiment of the present application.

FIG. 4 is a flow chart of a data processing method according to an embodiment of the present application.

Fig. 5 is a schematic diagram illustrating a carry-over processing connection of a multiplication unit according to an embodiment of the present application.

The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings. Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element defined by the phrase "comprising one … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element, and furthermore, elements having the same name in different embodiments of the present application may have the same meaning or may have different meanings, a particular meaning of which is to be determined by its interpretation in this particular embodiment or by further combining the context of this particular embodiment.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

First embodiment

In one aspect, the present application provides a multiplication array of encryption processing circuits, and FIG. 1 is a block diagram of a multiplication array according to an embodiment of the present application.

As shown in fig. 1, the multiplication array of the encryption processing circuit comprises a plurality of multiplication units CELL which are cascaded in turn, wherein the multiplication units CELL are used for performing iterative operation on calculation variables in modular multiplication operation; the multiplication unit CELL includes a first multiplier MULT, a first adder ADD1, a second adder ADD2.

The first multiplier MULT is configured to multiply a pair of variable multipliers, a first input terminal of the first multiplier MULT is configured to receive a first variable multiplier of the pair of variable multipliers, a second input terminal of the first multiplier MULT is configured to receive a second variable multiplier of the pair of variable multipliers, and a first output terminal of the first multiplier MULT is connected to a first input terminal of the first adder ADD1 to output a first calculation result to the first adder ADD1.

The second input end of the first adder ADD1 is configured to receive first carry data of a multiplication operation of a low-stage multiplication unit CELL, so that the first adder ADD1 performs a first addition operation on the first calculation result and the first carry data, and the output end of the first adder ADD1 is connected to the first input end of the second adder ADD2, so as to output a second calculation result to the second adder ADD2.

The second input end of the second adder ADD2 is configured to receive an output result of the higher-order multiplication unit CELL, so that the second adder ADD2 performs a second addition operation on the second calculation result and the output result of the higher-order multiplication unit CELL, so as to output an output result of the present-order multiplication unit CELL.

Fig. 2 is a schematic diagram of a cascade of multiplication units according to an embodiment of the present application. Fig. 3 is a signal input/output diagram of a multiplication unit according to an embodiment of the present application.

Referring to fig. 2 and 3, the input signal of the present stage multiplication unit CELL may have a first input signal B _i Second input signal N _i Third input signal A _i Fourth input signal Q _i . The calculation result after the CELL operation of the multiplication unit of the present stage is output as an output result S _i 。

Illustratively, in a cascaded multiplication CELL, each stage of multiplication CELL inputs a different stage of data. With a first input signal B _i For example, the first input signal input into the multiplication unit CELL of the present stage is B _i-1 The first input signal of the multiplication unit CELL input of the higher stage is B _i+1 Other input and output signals are analogized in order and are not described in detail herein. In the signal descriptions of the present application, a signal that does not account for the progression defaults to a signal of the present level.

In one embodiment, the first end of the first multiplier MULT may receive the first input signal B _i Or a second input signal N _i A second end of the first multiplier MULT may receive the third input signal A _i Or a fourth input signal Q _i . The first multiplier MULT may output the first calculation result to the first adder ADD1 through the first output terminal after the multiplication operation.

Optionally, the first multiplier MULT is connected to a higher-level multiplication unit CELL through a second output end, so as to output first carry data of the multiplication operation of the present-level multiplication unit CELL to the higher-level multiplication unit CELL.

With continued reference to fig. 2, the first input terminal of the first adder ADD1 may receive the first calculation result, the second input terminal of the first adder ADD1 may receive the first carry data of the lower stage multiplication unit CELL, the first adder ADD1 performs an addition operation on the first calculation result and the first carry data of the lower stage multiplication unit CELL, and the first output terminal of the first adder ADD1 outputs the addition result of the first adder ADD1 as the second calculation result.

With continued reference to fig. 2, the first input end of the second adder ADD2 may input the second calculation result, the second input end of the second adder ADD2 may input the output result of the higher-order multiplication unit CELL, the second adder ADD2 performs an addition operation on the second calculation result and the output result of the higher-order multiplication unit CELL, and the addition result of the second adder ADD2 is output through the first output end of the second adder ADD2 and is output through the calculation result output end of the multiplication unit CELL.

The connection sequence of the multiplier and the adder in the embodiment can support the operation realization function required by the encryption processing process. Alternatively, different data can be input to the multiplier array in different time sequence periods, so that two multiplication operations can be realized by multiplexing one multiplier, and the manner can enable the multiplier to be continuously full of workload in an iteration period, so that the throughput capacity of the multiplier array can be exerted to the limit. Illustratively, the first adder ADD1 and the second adder adopt carry save adders having carry output ends and carry input ends, and respective carry input signals may be obtained from the own carry output signals by taking two beats, that is, inputting the own carry input ends again after two timing periods.

The multiplication array of the embodiment is designed into a pipeline structure through cascading a plurality of multiplication units CELL, so that the time sequence control problem of a large bit width multiplier can be effectively relieved, and further, a carry save adder is adopted, so that the problem of serial logic of large addition carry can be solved, high-base Radix becomes possible, the multiplication units CELL are cascaded to form the benefit of pipeline design, the bit width of a base unit is increased, the input period cost is reduced, the working frequency is increased, and the operation efficiency is greatly improved. It should be noted that, in the application process of the multiplication array, a common data processing method program may be used for driving, or a newly developed data processing method program may be used for driving, so as to implement the correlation operation of the multiplication array.

Optionally, the multiplication array of the encryption processing circuit further includes a first selector, where the first selector is connected in series between the output end of the first adder ADD1 and the first input end of the second adder ADD2, the first input end of the first selector is connected to the output end of the first adder ADD1, the second input end of the first selector receives the second calculation result of the higher-order multiplication unit CELL, and the output end of the first selector is connected to the first input end of the second adder ADD2, so that the second adder ADD2 performs the second addition operation on the second calculation result of the higher-order multiplication unit CELL.

Referring to fig. 2, the first clock period even is an even period and the second clock period odd is an odd period. The multiplication unit CELL can input a first input signal B in a first clock period even _i And a third input signal A _i Inputting a second input signal N in a second clock period od _i And a fourth input signal Q _i . Accordingly, the first calculation result output by the first multiplier MULT during the calculation of the first clock period even and the second clock period odd is a calculation result for different input signals. Illustratively, in the calculation of the first clock period even, the first multiplier MULT performs the first input signal B _i And a third input signal A _i Namely A _i *B _i Multiplication calculation and output of the first calculation result, namely A, through the first output end _i *B _i The first carry data output from the second output terminal of the first multiplier MULT is A _i *B _i Carry results of (2); in the calculation of the second clock period od, the first multiplier MULT performs the second input signal Ni and the fourth input signal Q _i I.e. Q _i *N _i And outputs a first calculation result, namely the second carry data output by the first multiplier MULT is Q _i *N _i Carry results of (2).

Accordingly, the first adder ADD1 inputs the first calculation result as a through the first input terminal in the first clock period even _i *B _i The first carry data input from the lower stage multiplication unit CELL through the second input end is A _i-1 *B _i-1 So that the addition result of the two input data is outputted as the second calculation result (A) _i *B _i ) LSB; the first adder ADD1 inputs the first calculation result Q through the first input end at the second clock period od _i *N _i The first carry data input from the lower stage multiplication unit CELL through the second input end is Q _i-1 *N _i-1 Thereby outputting the addition result (Q) of the two input data in the second clock period od _i *N _i ）LSB。

Optionally, the second adder ADD2 performs a second adding operation on the second calculation result and the output result of the higher-order multiplication unit CELL at the first timing, and performs a second adding operation on the output result of the present-order multiplication unit CELL and the second calculation result of the higher-order multiplication unit CELL at the second timing. Illustratively, the first timing is a first clock period enven as in the embodiment of fig. 2; the second timing instant is the second clock period odd in the embodiment of fig. 2.

Accordingly, the first input terminal of the second adder ADD2 may input the second calculation result (a) for the first input terminal of the second adder ADD2 in the first clock period even through the selective input of the first selector _i *B _i ) The LSB may be inputted with a second calculation result (Q) of the higher-order multiplication unit CELL at the first input terminal of the second adder ADD2 at the second clock period od _i *N _i ）LSB。

Optionally, the multiplication array of the encryption processing circuit further comprises a second selector connected in series between the second input and the output of the second adder ADD 2;

the first input end of the second selector receives the output result of the higher-order multiplication unit CELL, the second input end of the second selector is connected with the output end of the second adder ADD2, and the output end of the second selector is connected with the second input end of the second adder ADD2, so that the second adder ADD2 performs the second addition operation on the output result of the present-order multiplication unit CELL.

With continued reference to fig. 2, illustratively, the second input terminal of the second adder ADD2 may input the output result S of the higher-order multiplication unit CELL for the second input terminal of the second adder ADD2 in the first clock period even through the second selector _i+1 The method comprises the steps of carrying out a first treatment on the surface of the And a third calculation result, which is the output result of the current-stage multiplication unit CELL, is input to the second input end of the second adder ADD2 at the second clock period od.

Optionally, the multiplication array of the encryption processing circuit further includes a third selector, where the third selector is connected in series to an output end of the second adder ADD2, a first input end of the third selector is connected to an output end of the second adder ADD2, a second input end of the third selector is set to zero, and an output end of the third selector is connected to a low-stage multiplication unit CELL.

With continued reference to fig. 2, illustratively, the output result of the multiplication unit CELL passes through the third selector, and may be selectively set to zero for the first input terminal of the second selector of the lower stage multiplication unit CELL or output the output result of the present stage multiplication unit CELL to the fifth register regS according to the calculation requirement.

Optionally, the multiplication array of the encryption processing circuit further comprises a first register regX; the first register regX is connected in series to the first input terminal of the first multiplier MULT to receive and temporarily store the first variable multiplier.

The first variable multiplier that the first register regX can temporarily store can be the first input signal B _i Or a second input signal N _i Data is output to the first input terminal of the first multiplier MULT at different timing periods.

Optionally, the multiplication array of the encryption processing circuit further comprises a second register regY. The second register regY is connected in series to the second input end of the first multiplier MULT to receive and temporarily store the second variable multiplier.

The second variable multiplier that can be temporarily stored in the second register regY can be the third input signal a _i Fourth input signal Q _i Data is output to the second input terminal of the second multiplier in different time sequence periods.

Optionally, the multiplication array of the encryption processing circuit further comprises a third register reg1; the third register reg1 is connected in series between the first input terminal of the first adder ADD1 and the first output terminal of the first multiplier MULT to receive and temporarily store the first calculation result.

Illustratively, the third register reg1 may register the first calculation result for output to the first input of the first adder ADD1 during the calculation.

Optionally, the multiplication array of the encryption processing circuit further comprises a fourth register reg2; the fourth register reg2 is connected in series to the second input end of the first adder ADD1, so as to receive and temporarily store the first carry data of the multiplication operation of the low-stage multiplication unit CELL.

Illustratively, the fourth register reg2 may temporarily store the first carry data output by the lower stage multiplication unit CELL for output to the second input of the first adder ADD1 during the calculation.

Optionally, the multiplication array of the encryption processing circuit further includes a fifth register regS, where the fifth register regS is connected in series to the output end of the second adder ADD2, so as to receive and temporarily store the output result.

Illustratively, the fifth register regS may register the selection result of the third selector for outputting to the first input terminal of the second selector of the lower stage multiplication unit CELL during the calculation.

Optionally, the multiplication array of the encryption processing circuit further includes a sixth register regW, which is serially connected between the output terminal of the second adder ADD2 and the second input terminal of the second adder ADD2, to receive and temporarily store the output result.

Illustratively, the sixth register regW may temporarily store the output result of the multiplication unit CELL, so as to output the output result to the second input terminal of the second selector of the present stage multiplication unit CELL in the second timing period in the calculation process.

Optionally, the first adder ADD1 is a carry save adder, and the first adder ADD1 includes two stages of registers connected in series, where the two stages of registers are used to receive and temporarily store the second carry data of the first addition operation and latch for two clock cycles.

Optionally, the second adder ADD2 is a carry save adder, and the second adder ADD2 includes two stages of registers connected in series, where the two stages of registers connected in series are configured to receive and temporarily store the third carry data of the second addition operation and latch for two clock cycles.

Illustratively, the carry register of the adder is a serial two-stage register, i.e. the current carry output is latched in serial two stages and then imported into the carry input. The two stages of buffering are behavioral logic that requires two clock cycles for the first clock cycle even and the second clock cycle odd to be input in order to adapt the multiplication array.

Second embodiment

Based on the first embodiment, the present application further provides a data processing method, which is applied to the multiplication array of the encryption processing circuit to implement the modular multiplication operation of the montgomery representation, and fig. 4 is a flowchart of the data processing method according to an embodiment of the present application.

As shown in fig. 4, in an embodiment, the data processing method includes:

s10: and setting a preset base value, wherein the preset base value is used for representing the unit bit width of the multiplication unit.

S20: splitting a calculated variable of the modular multiplication operation corresponding to the Montgomery expression according to the preset basic value so as to correspond to the unit bit width of each multiplication unit in the multiplication array.

S30: and importing the split calculated variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel and realize the modular multiplication operation.

In an asymmetric encryption algorithm (RSA), a large number of modular multiplications are solved, montgomery modular multiplications are most effective, and the modular multiplications are simplified into operations which are easy to realize by hardware, such as shifting, adding, multiplying and the like, so that the modular multiplications become theoretical bases of the asymmetric encryption algorithm realized by the hardware. Montgomery modular multiplication expression is as follows:

a) Pre-calculation, modulus variant nvar=n (N'modBeta), wherein (-N')modR=1, r=β ζ, β=2radius, radius being the Radix;

b) Setting s=0;

c) i is performed from 0 up to n (n=sizen/radius):

c.1) Q _i = Smodβ;

c.2) S = S/β+ (Q _i *Nvar)/β+ Ai*B + ((Q _i =0) ? 0 : 1)；

d) Output s=a×b×r ^-1 (modN)。

Parallel modular multiplication based on Montgomery algorithm, and the bottom layer is composed of a group of multiplexing multiplier arrays.

The solution further disassembles Montgomery expression step c into a specific implementation of a multiplication array. Referring to fig. 2 and 3, optionally, a preset base value radius is set to characterize the bit width of the unit, and main variables in the calculation process, such as N, A, B, Q and S, are split into n data units according to the preset base value radius. For example [ N _n-1 ,…N _i ,…N ₀ ]，[A _n-1 ,…A _i ,…A ₀ ]，[B _n-1 ,…B _i ,…B ₀ ]And [ S ] _n-1 ,…S _i ,…S ₀ ]Where n=size/radius, size refers to the modulo-field bit width of RSA (typically modulo-field bit width as 1024/2048/4096 bits). Wherein S is the naming of intermediate results realized inside the modular multiplication circuit, S can be split into grouping arrays [ S ] _n-1 ,…S _i ,…S ₀ ]S is iterated for a plurality of times, and finally the S is used as output data of the multiplication array.

In this embodiment, the data processing method is matched with the multiplication array of the encryption processing circuit, so that the time sequence control problem of the large-bit-width multiplier can be effectively relieved based on the pipeline structure of the multiplication array, and the carry save adder is further adopted, so that the serial logic problem of the large-number addition carry can be solved, the high-base Radix becomes possible, the multiple multiplication units are cascaded to form the benefit of pipeline design, the bit width of the base unit is increased, the input period cost is reduced, the working frequency is also increased, and the operation efficiency is greatly improved.

Referring to FIGS. 2-4, illustratively, for Q in Montgomery expressions _i = SmodBeta can be realized by taking the LSB Radix bit of S at each iteration, namely taking the multiplication unit S ₀ Is a function of the iteration value of (a).

Referring to FIGS. 2-4, for an example, for A in the Montgomery expression _i *B _i And Q _i *N _i Then consider that A, B and Q, N are split into products of N units, and are accumulated iteratively, realThe existing method is that a multiplication array is arranged according to multiplication units, split N and B values are parallelly imported, A and Q are serially imported according to the cross of the multiplication units, and all units simultaneously execute multiplication in parallel, namely, the solution A is simplified _i *B _i And Q _i *N _i The problem is that the product of the multiplication cell (i) of the present stage should be added to the carry of the cell (i-1) of the next stage, i.e. a of a single cell, taking into account the multiplication carry _i *B _i Should be implemented as LSB (A) _i *B _i )+MSB(A _i-1 *B _i-1 ) Same as Q _i *N _i Should be implemented as LSB (Q) _i *N _i )+MSB(Q _i-1 *N _i-1 )。

Referring to FIGS. 2-4, for example, for S/β and (Q Nvar)/β in the Montgomery expression, since β=2radial, the division indicates that all multiplication units of S are shifted to the right by the preset base value radius, which is exactly equivalent to carry from the i+1st stage to the i-th stage, i.e., S/β implementation is reduced to a single unit S _i+1 Down stage S _i Carry, (Q x Nvar)/β implementation is reduced to a single cell (Q _i+1 *N _i+1 ) Down stage (Q) _i *N _i ) Carry;

finally, according to A ₀ ,…,A _n-1 Sequentially introducing all A _i (when implemented, Q _i Without introduction, Q at a time _i Equal to the current S ₀ ) The output of all the multiplication units obtained S _n-1 ,…S _i ,…S ₀ ]I.e. Montgomery modular multiplication S=A.times.B.times.R.times.1modN。

The ports and circuit design of the multiplication units of the multiplication array are shown in FIG. 2. Because two multiplication operations are to be performed in Montgomery-type step c and then summed, in order to reduce the resource overhead, the bottom multiplication unit circuit may be configured by multiplexing the multipliers, for example, 1 multiplier and 2 carry save adders. Illustratively, in the embodiment of fig. 2, which is the circuit structure of two of the multiplier units of the bottom-layer multiplier array, such c multiplier units total n=2048/64=32, because RSA2048 is realized at the highest and the arrays are grouped by a preset base value of radix=64. Specifically, the multiplier adopts the optimal area of a standard library and the time sequence MULTI; the adder adopts 2-beat carry preservation and is just used as the next pen A _i The carry input of (2) perfectly solves the carry cascade problem of the large number addition.

In the present embodiment, B and N _var The parallel inputs are spread out in terms of multiplication units and remain unchanged throughout the iteration. A and Q are alternately input: even periodic input A _i Odd cycle input Q _i This is done to multiplex 1 multiplier to implement two multiplication operations, and each mult is continuously filled in an iteration cycle, which can bring the throughput capability of the multiplier array to a limit.

The maximum delay time of a logic circuit in a clock cycle at a given frequency needs to meet the basic requirements of setup time and hold time, otherwise delay problems (timing problems) in circuit design may be caused. It will be appreciated that the logic complexity of the circuit tends to limit the clock frequency, and that with a reasonable logic design, the circuit can have higher frequency capabilities, as well as higher performance.

Referring to fig. 2, the scheme of the multiplication unit in the multiplication array of the present embodiment may be designed as a pipeline structure, so as to effectively alleviate the timing problem of the large-bit-width multiplier, and the carry save adder may solve the serial logic problem of the large-number addition carry, so that the high-base Radix becomes possible, and therefore, the high-base scheme of RSA2048 (downward compatible RSA 1024) may be implemented by radix=64, and the working frequency implemented on the ASIC may exceed 600MHZ. The benefit of the pipeline design is that the bit width of the base unit is increased, the input period overhead is reduced, the working frequency is also improved, and the operation efficiency is greatly improved.

Illustratively, with this circuit implementation employed in the present embodiment, after each multiplication calculation is completed, the bottom layer may have a portion of the non-0 carry result saved in a register, referred to as a residue, and the circuit may be used for the ending process. As shown in FIG. 5, the multiplication array scheme of the present application is according to A [0,1, …, n-1 ]]The input is iterated when the final A _n-1 After input, a complementary process can be performed to determine whether all the carry save registers of the multiplication units CELL are 0, if so, no carry residues are indicated, and outputs S [0,1, …, n-1 ]]The value is enough, otherwise, one more iteration (nth round) is performed, A can be obtained _n ,Q _n+1 B, N are all input with a value of 0, and the adder carry save value of each stage of multiplication unit CELL is shifted left to the previous stage of carry save register. The carry save register here refers to a register that saves the information of the carry co of ADD1 and ADD2, and corresponds to two small squares beside the first adder ADD1 and the second adder ADD2 in fig. 2, as Schematic representation. When this additional nth iteration is completed, it is continued to determine whether the carry of all multiplication units CELL still remains, and then the operation flow described previously is repeated. Experiments prove that the method can be used for processing cleanly by one supplementary iteration, and the probability of needing multiple supplementary iterations is very small.

Third embodiment

The application also provides a processing terminal comprising a multiplication array of encryption processing circuits as described above.

Optionally, the processing terminal includes an interconnected processor and storage medium, wherein:

the storage medium is used for storing a computer program;

In the processing terminal embodiment of the present application, an ASIC (application specific integrated circuit) implementation scheme applicable to RSA2048 (downward compatible RSA 1024) is designed, and from the resource perspective, the maximum size=2048, the radius=64, and the n=size/radius=32, that is, the number of multiplication CELLs CELL of the modular multiplication array is 32, and the single multiplication CELL processes a bit width of 64 bits (i.e. calls 1×multi-64 and 2×add-64). The present embodiment proposes to support both the noccrt (non-chinese remainder theorem mode) and CRT (chinese remainder theorem mode), so the modular array can be deployed in two ways: when operating in the noctr mode, typically referred to as public key operation, 32 sets of multiplication units are cascaded in sequence, corresponding to a set of 2048-bit latch registers B and N, and a set of 2048-bit shift registers a, all valid bits being dependent on size=1024 or 2048, such as RSA2048 being occupied by resources cell [1] [0] and a, B, N [1] [0], while RSA1024 being occupied by cells [0] and a, B, N [0]. When operating in CRT mode, generally referred to as private key operation, all the multiplication units are split into 2 groups of independent parallel, with 16 multiplication units CELL within each group being cascaded in turn, corresponding to two respective 1024-bit latch registers B [0] [1] and N [0] [1], and two 1024-bit shift registers A [0] [1]. Under the structure, the CRT multiplexes the same resource as the nonCRT, splits the large modular multiplication into two groups of small modular multiplication with halving scale, can provide parallel modular multiplication for modular exponentiation of the upper layer, and can achieve the highest 4 times acceleration effect.

Specifically, for CRT and noccrt, the key representation is different: a non-CRT private key (n, d), a CRT private key (p, q, dp, dq, qinv). In the initialization and operation stages, the two execution parameters and flows are slightly different. The noccrt initialization phase solves for Nvar and the (e.g., 1024-bit) operation phase solves for p=c≡modn. Whereas the CRT initialization phase solves for Pvar, e.g. 512 bits, and Qvar, e.g. another 512 bits, the operation phase solves for m1=c dpmodp, m2=c^dqmodq, h=qinv*(m1-m2)modp, p=m2+h q, wherein m1, m2 and h, p are calculated by multiplexing the multiplication array, and m1 and m2 which are time-consuming in main calculation are solved, so that completely independent parallel execution is realized.

Therefore, in terms of private key operation performance, the CRT-RSA achieves 4 times acceleration compared with the nonCRT. Performance evaluation is shown in table 1. The scheme is based on an ASIC, and can realize a CRT-RSA circuit with maximized performance and minimized area.

Table 1 private key operation Performance evaluation reference table

Fourth embodiment

According to the multiplication array, the data processing method, the processing terminal and the storage medium, one multiplier is multiplexed in each multiplication unit to realize two multiplication operations, and throughput capacity of the multiplier array can be brought into a limit. The multiple multiplication units designed by the scheme can adopt a pipeline structure, so that the time delay problem of the large-bit-width multiplier can be effectively relieved, and the input period overhead is reduced; in addition, the carry save adder can solve the problem of serial logic of the carry of the majority addition, so that high-base Radix is possible, a high-base scheme of RSA2048 (downward compatible with RSA 1024) is realized by radix=64, the working frequency realized on the ASIC exceeds 600MHz, the working frequency is improved, and the operation efficiency is greatly improved.

In this application, step numbers such as S10 and S20 are used for the purpose of more clearly and briefly describing the corresponding content, and are not to constitute a substantial limitation on the sequence, and those skilled in the art may execute S20 first and then S10 when implementing the present invention, but these are all within the scope of protection of the present application.

The embodiments of the system and the storage medium provided in the present application may include all the technical features of any one of the embodiments of the method, where the expansion and explanation of the description are substantially the same as those of each embodiment of the method, and are not repeated herein.

The present embodiments also provide a computer program product comprising computer program code which, when run on a computer, causes the computer to perform the method in the various possible implementations as above.

The embodiments also provide a chip including a memory for storing a computer program and a processor for calling and running the computer program from the memory, so that a device on which the chip is mounted performs the method in the above possible embodiments.

It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided in the embodiments of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device of the embodiment of the application can be combined, divided and pruned according to actual needs.

In this application, the same or similar term concept, technical solution, and/or application scenario description will generally be described in detail only when first appearing, and when repeated later, for brevity, will not generally be repeated, and when understanding the content of the technical solution of the present application, etc., reference may be made to the previous related detailed description thereof for the same or similar term concept, technical solution, and/or application scenario description, etc., which are not described in detail later.

In this application, the descriptions of the embodiments are focused on, and the details or descriptions of one embodiment may be found in the related descriptions of other embodiments.

The technical features of the technical solutions of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the present application.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. The multiplication array is characterized by comprising a plurality of multiplication units which are sequentially cascaded, wherein the multiplication units are used for carrying out iterative operation on calculation variables in modular multiplication operation;

2. The multiplication array of claim 1 wherein the first multiplier is coupled to a higher-order multiplication unit through a second output to output first carry data of a multiplication operation of the present-order multiplication unit to the higher-order multiplication unit.

3. The multiplication array of claim 2 further comprising a first selector coupled in series between an output of the first adder and a first input of the second adder, the first input of the first selector coupled to the output of the first adder, the second input of the first selector receiving a second calculation result of a higher order multiplication unit, the output of the first selector coupled to the first input of the second adder, such that the second adder performs the second addition operation on the second calculation result of the higher order multiplication unit.

4. A multiplication array according to claim 3 further comprising a second selector connected in series between a second input and an output of said second adder;

5. The multiplication array of claim 4 further comprising a third selector, wherein the third selector is coupled in series with the output of the second adder, wherein a first input of the third selector is coupled to the output of the second adder, wherein a second input of the third selector is zeroed, and wherein an output of the third selector is coupled to a lower stage of multiplication units.

6. The multiplication array of claim 5 wherein the second adder performs a second addition operation of the second calculation result and the output result of the higher-order multiplication unit at a first timing, and performs a second addition operation of the output result of the present-order multiplication unit and the second calculation result of the higher-order multiplication unit at a second timing.

7. The multiplication array of any one of claims 1-6, wherein the multiplication array further comprises a first register;

and/or the number of the groups of groups,

the multiplication array further includes a second register;

and/or the number of the groups of groups,

the multiplication array further includes a third register;

and/or the number of the groups of groups,

the multiplication array further includes a fourth register; the fourth register is connected in series with the second input end of the first adder so as to receive and temporarily store the first carry data of the multiplication operation of the low-level multiplication unit;

and/or the number of the groups of groups,

the multiplication array further comprises a fifth register, wherein the fifth register is connected in series with the output end of the second adder so as to receive and temporarily store the output result;

and/or the number of the groups of groups,

the multiplication array further comprises a sixth register connected in series between the output of the second adder and the second input of the second adder to receive and temporarily store the output result.

8. A data processing method applied to the multiplication array of any one of claims 1 to 7 to implement a modular multiplication operation of a montgomery representation, the data processing method comprising:

9. The data processing method of claim 8, wherein the modular multiplication of the montgomery representation includes a modulo operation; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

10. The data processing method of claim 8, wherein the modular multiplication of the montgomery representation includes a multiplication operation; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

11. The data processing method of claim 8, wherein the modular multiplication operation of the montgomery representation comprises a division operation; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

12. A data processing method as claimed in any one of claims 8 to 11, wherein the adder of each multiplication unit comprises a carry register; the step of importing the split calculation variable into a plurality of corresponding multiplication units so as to enable the multiplication units to calculate in parallel, and the step of realizing the modular multiplication operation comprises the following steps:

13. The data processing method of claim 12, wherein the step of adding at least one iterative operation to the multiplication array comprises:

14. A processing terminal comprising the multiplication array of any one of claims 1-7;

and/or the number of the groups of groups,

the storage medium is used for storing a computer program;

the processor being adapted to read the computer program and to run the data processing method according to any of claims 8-13.

15. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method according to any of claims 8-13.