CN116540977A

CN116540977A - Modulo multiplier circuit, FPGA circuit and ASIC module

Info

Publication number: CN116540977A
Application number: CN202310813349.6A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Beijing Real AI Technology Co Ltd
Current assignee: Beijing Real AI Technology Co Ltd
Priority date: 2023-07-05
Filing date: 2023-07-05
Publication date: 2023-08-04
Anticipated expiration: 2043-07-05
Also published as: CN116540977B

Abstract

The invention provides a modular multiplier circuit, an FPGA circuit and an ASIC module. The modulo multiplier circuit includes: a first multiplier, a second multiplier, a third multiplier, a first adder, a second adder, and a second multiplexer, wherein: the output end of the first multiplier outputsAndthe output end of the second multiplier outputs the product p2 of p1 and m; the first input terminal of the third multiplier receivesThe output end of the third multiplier outputsAndthe product p3, the output end of the first adder outputs the difference t between p2 and p3, the output end of the second adder outputs the difference t-m between t and m, and the output end of the two-way multiplexer selects the result to be output according to the size relation of t and m. By the invention, the hardware cost of the modular multiplier circuit can be reduced.

Description

Modulo multiplier circuit, FPGA circuit and ASIC module

Technical Field

The present invention relates to the field of circuit technologies, and in particular, to a modulo multiplier circuit, an FPGA circuit, and an ASIC module.

Background

It is important to model multiplication operations in cryptographyIs performed according to the operation of (a). The modular multiplication can be described by the following formula:wherein->And->Is the operand of modular multiplication, m is the module used in modular multiplication, n is +.>And->Is a length of (c). To avoid expensive division operations, barrett's algorithm is typically used or large digital-to-analog multiplication operations are implemented.

The Barrett Reduction algorithm is the modular algebraic reduction algorithm proposed by Barrett in 1986. The basic process of Barrett Reduction is described as follows:

（1）

（2）

（3）

（4）

（5）

（6）

（7）

due toIs a power of 2, thus pair->The division of (c) can be replaced by a simple bit operation. It can be seen that Barrett Reduction uses only limited multiplication operations, completely avoiding expensive division operations. The above (1) to (7) can achieve +.>And (3) operating.

The inventors have found that the use of circuitry to implement the above-described modulo multiplier of Barrett Reduction (Barrett's protocol) is largely divided into two ideas: commonality realization, inputM does not do constraint, so that the requirements of various scenes can be met, but the circuit structure is complex, the optimization space is small, and the performance is difficult to improve; one is a special implementation, in which a special constraint is made on m, from which a highly optimized circuit can be produced, but which is only suitable for certain specific application scenarios due to the strong constraints.

Meanwhile, the inventor further researches and discovers that in the two methods, the modulo multiplier circuit realizes multiplication operation and modulo operation by using two independent modules, namely, calculates an intermediate result firstThen calculate +.>. As shown in fig. 1, the prior art multiplier architecture, the Multiplier (MUL) module and the modulo operation (Barrett Reduction) module are two independent, separate modules. If the limit of the two modules can be broken and the two modules are fused, redundant calculation can be reduced, and unnecessary calculation is reducedAnd the intermediate result is needed, so that the hardware cost is reduced, and the calculation throughput rate is improved.

Therefore, how to reduce the hardware cost of the modulo multiplier circuit is a technical problem to be solved in the art.

Disclosure of Invention

The invention aims to provide a modular multiplier circuit, an FPGA circuit and an ASIC module, which are used for solving the technical problems in the prior art.

In one aspect, the present invention provides a modular multiplier circuit for achieving the above object.

The modulo multiplier circuit includes: a first multiplier, a second multiplier, a third multiplier, a first adder, a second adder, and a second multiplexer, wherein: the first input end of the first multiplier receivesThe second input of the first multiplier receives +.>The output end of the first multiplier outputs +.>And->Is p1, wherein ∈>And->Is the operand of the modular multiplication, m is the module used for the modular multiplication, +.>，/>Is constant, n isAnd->Is a length of (2); the first input end of the second multiplier is connected with the output end of the first multiplier, the second input end of the second multiplier receives m, and the output end of the second multiplier outputs a product p2 of p1 and m; the first input of the third multiplier receives +.>The second input of the third multiplier receives +.>The output end of the third multiplier outputs +.>And->The product p3; a first input end of the first adder is connected with an output end of the second multiplier, a second input end of the first adder is connected with an output end of the third multiplier, and an output end of the first adder outputs a difference value t between p2 and p3; the first input end of the second adder is connected with the output end of the first adder, the second input end of the second adder receives m, and the output end of the second adder outputs a difference t-m between t and m; the first input end of the two-way multiplexer is connected with the output end of the first adder, the second input end of the two-way multiplexer is connected with the output end of the second adder, and the output end of the two-way multiplexer selects a result to be output according to the size relation of t and m, wherein when t>When=m, the multiplexer outputs t-m, when t<And m, outputting t by the two-way multiplexer.

Further, the first multiplier outputs the highest n bits of the product p1, the second multiplier outputs the lowest n bits of the product p2, and the third multiplier outputs the lowest n bits of the product p 3.

Further, the first adder is configured to perform addition operation with carry 1 after inverting the output result of the second multiplier.

Further, the modulo multiplier circuit further comprises: a pre-calculation unit for calculatingWherein the second input of the first multiplier is adapted to be connected to the output of the pre-calculation unit.

In another aspect, to achieve the above object, the present invention provides an FPGA circuit, which includes any of the modulo multiplier circuits provided by the present invention, wherein the multipliers in the modulo multiplier circuits are implemented by a DSP.

In a further aspect, to achieve the above object, the present invention provides an ASIC module comprising a Barrett circuit comprising any of the modular multiplier circuits provided by the present invention.

The modular multiplier circuit, the FPGA circuit and the ASIC module provided by the invention break the limit of the multiplication operation module and the modular operation module when the modular multiplier hardware is realized aiming at the application scene that one operand of the modular multiplication is constant, do not calculate the product of x0 and x1, and integrate the two modules of the multiplication operation module and the modular operation module into one module (namely the modular multiplier circuit provided by the invention), and the first multiplier, the second multiplier, the third multiplier, the first adder, the second adder and the two-way multiplexer are used for constructing the modular multiplier circuit, so that the redundant calculation in the modular multiplier is reduced, the unnecessary intermediate result is reduced, the hardware cost of the modular multiplier circuit, the FPGA circuit and the ASIC module is reduced, and the calculation throughput rate is improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a block diagram of a prior art multiplier;

fig. 2 is a circuit diagram of a modular multiplier circuit according to a first embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The inventors have found that in computer systems, multiplier implementations generally require more circuit area, which also dominates circuit power consumption. If the occupation of the multipliers by the modulo multipliers can be reduced, this helps to increase the throughput of the system, reduce the power consumption and hardware area, and therefore the focus of the invention is to optimize the number of multipliers used in the modulo multiplier circuit.

The inventors have further studied that in the application scenario for a modulo multiplier circuit, i.e. in the application scenario where a modulo multiplication operation needs to be performed, there is typically a constant number of one party. For example: in the DNN reasoning task, the weight of the DNN model is constant; in the feature comparison task, the feature library is constant; in the encryption/decryption task, the key is constant; in the number-theory transform (NTT) or FFT, the twiddle factor is constant. Thus, the modular multiplication operation can make full use of this feature to optimize the hardware circuit.

Based on this, assuming that x1 is constant, line (3) in the general Barrett Reduction algorithm can be rewritten as follows:

it is obvious that the process is not limited to,is constant, so +.>Merging into the (2) th row, and rewriting the (2) th row into:

meanwhile, through error analysis, the program is added>The value can be further rewritten as +.>Decrease->Is effective to reduce the bit width required by the multiplier. (where the cost of the multiplier is proportional to the square of the operational digital width), the fused modulo multiplier algorithm can be derived as follows:

（8）

（9）

（10）

（11）

（12）

（13）

（14）

（15）

the above algorithm does not calculate the product of x0 and x1, but fuses the operation of multiplication of x0 and x1 with the modulo m operation, i.e. the fused modulo multiplier algorithm. The fused modular multiplier algorithm can be realizedAnd (3) operating.

Comparing the fused modulo multiplier algorithm with a general Barrett Reduction algorithm:

for convenience we use n in describing the operational digital width variation of the multipliern- > m represents that two n-bit input operands are multiplied to obtain m-bit multiplication results.

In estimating the multiplier cost (area or number of logic gates), we use nThe cost of the n-n multiplier is 1 per unit cost.

Thus, n can be estimatedThe cost of the n-2 n multiplier is 2;

2nthe cost of the 2n→2n multiplier is 4.

2nThe cost of the 2n→4n multiplier is 8.

By analyzing the cost of three multipliers in a circuit, it can be seen that the bit width of the multipliers in a fused mode multiplier is significantly reduced.

For example, x=x0 is calculated in a common mode multiplierIn x1, n is required>n-2 n multiplier with cost ratio +.>（n/>n→n) is doubled;

as another example, in a common mode multiplier due to x andis 2n, thus calculating +.>=x/> The multiplier cost is 8. But in a fused modulo multiplier circuit due to +.>And->Is n bits in bit width, thus +.>Multiplier cost n->n-2 n is only 2, which is far lower than the cost of a general-purpose modulo multiplier.

It can be seen that based on the fused modulo multiplier algorithm, the relative cost of the multiplier (here the cost of an n multiplier is taken as the unit cost) decreases from 12 (=2+8+2) to 4 (=1+2+1).

The invention realizes a modular multiplier circuit, an FPGA circuit and an ASIC module based on the fused modular multiplier algorithm. Specific embodiments of the modular multiplier circuit, FPGA circuit and ASIC module provided by the present invention are described in detail below.

Example 1

The first embodiment of the invention provides a modular multiplier circuit, which breaks the limit of a multiplication operation module and a modular operation module when the modular multiplier is realized by hardware, so that the two modules are combined into one module, redundant calculation is reduced, unnecessary intermediate results are reduced, the hardware cost of the modular multiplier circuit is reduced, and the calculation throughput rate is improved. Fig. 2 is a circuit diagram of a modulo multiplier circuit according to a first embodiment of the present invention, as shown in fig. 2, the modulo multiplier circuit includes: the first multiplier U0, the second multiplier U1, the third multiplier U2, the first adder U3, the second adder U4 and the two-way multiplexer U5.

Wherein the first multiplier U0 is an n-by-n multiplier, and the first input terminal thereof receivesThe second input of the first multiplier U0 receives +.>The output of the first multiplier U0 outputs +.>And->Is p1, wherein ∈>And->Is the operand of the modular multiplication, m is the module used for the modular multiplication, +.>，/>Is constant, n is->And->Optionally, the first multiplier U0 outputs the highest n bits of the product p 1. Optionally, in the encryption algorithm implemented based on a modular multiplier circuit,/is>And->Is an operand which participates in a modular multiplication operation, in particular having different physical meanings at different stages, e.g.>And->May be plaintext, ciphertext, public and/or secret keys, etc.

The second multiplier U1 is an n×n multiplier, a first input end thereof is connected to an output end of the first multiplier U0, a second input end of the second multiplier U1 receives m, and an output end of the second multiplier U1 outputs a product p2 of p1 and m, that is, calculates a product of the output of the first multiplier U0 and m, and optionally, the second multiplier U1 outputs the lowest n bits of the product p 2.

The third multiplier U2 is an n-by-n multiplier, the first input of which receivesThe second input of the third multiplier U2 receives +.>The output of the third multiplier U2 outputs +.>And->The product p3, optionally, the third multiplier U2 outputs the lowest n bits of the product p 3.

The first adder U3 is an n-bit adder, a first input end of the first adder U3 is connected to an output end of the second multiplier U1, a second input end of the first adder U3 is connected to an output end of the third multiplier U2, and an output end of the first adder U3 outputs a difference t between p2 and p3, that is, calculates a difference between the outputs of the second multiplier U1 and the third multiplier U2, and optionally, the first adder U3 is configured to perform an addition operation with a carry 1 after inverting an output result of the second multiplier U1, so as to convert a subtraction operation into an addition operation.

The second adder U4 is an n-bit adder, a first input terminal of the n-bit adder is connected to the output terminal of the first adder U3, a second input terminal of the second adder U4 receives m, and an output terminal of the second adder U4 outputs a difference t-m between t and m, optionally, the second adder U4 is similar to the first adder U3, and is used for performing addition operation with carry 1 after inverting the output result of the first adder U3 so as to convert subtraction operation into addition operation.

The first input end of the two-way multiplexer U5 is connected with the output end of the first adder U3, the second input end of the two-way multiplexer U5 is connected with the output end of the second adder U4, and the output end of the two-way multiplexer U5 selects a result to be output according to the size relation of t and m, wherein when t > =m, the two-way multiplexer U5 outputs t-m, namely, the output result of the second adder U4, and when t < m, the two-way multiplexer U5 outputs t, namely, the output result of the first adder U3.

The mode multiplier circuit provided by the embodiment greatly reduces the hardware cost of the circuit. The hardware cost of the modular multiplier circuit mainly depends on the number and cost of the multipliers, if the multiplier cost of n x n- & gt n is assumed to be 1 as unit cost, the hardware cost of the traditional modular multiplier circuit is 12, and the hardware cost of the modular multiplier circuit provided by the invention is 4, which is equivalent to the hardware cost reductionAbout 66% lower. Meanwhile, due to the reduction of the multiplier, the circuit area is reduced, and the overall power consumption of the circuit is reduced. In addition, since the multiplier is a main factor for restricting the throughput rate of the system, the consumption of the circuit to the multiplier is reduced, which means that the throughput rate of the system is improved by 3 times under the condition that the total number of logic of the multiplier is fixed. In a conventional modular multiplication circuit, the highest delay module is the computation of xIs implemented by a 2n x 2n 4n multiplier, and the multiplier is located on the critical path. Because of its higher computational complexity, the circuit delay is higher and therefore the frequency is generally more difficult to boost. In the present invention, since x + ->Because the computational complexity is greatly reduced and the carry chain is also greatly shortened, the delay is significantly improved and higher operating frequencies are easier to use.

Optionally, in one embodiment, the modulo multiplier circuit further comprises: a pre-calculation unit for calculatingWherein the second input of the first multiplier U0 is used for connecting the output of the pre-calculation unit by which the +.>The calculation is completed.

The first multiplier U0 in fig. 2 implements lines 10, 11 of the fused modulo multiplier algorithm. U0 reception(width n bits) and pre-calculated +.>(n bits in width), the product p1 (2 n bits) of the two is calculated, and the lowest n bits of the product are discarded, leaving only the highest n bits (p 1') of the product, so U0 can be regarded as MSB MUL (high order multiplier). For convenience we use n +.>n→2n describes the output-input bit width of its multiplier.

The second multiplier U1 section implements line 12 of the fused modulo multiplier algorithm. It calculates the product of p1' (n bits) and m (n bits), but discards the highest n bits of the product, leaving the lowest n bits (p 2) of the product. U1 may be referred to as LSB MUL (low order multiplier).

The third multiplier U2 element is also an LSB MUL (low multiplier) which implements line 13 of the fused modulo multiplier algorithm, which computes the product of x0 and x1 of n bits, and retains the lowest n bits (p 3) of the product.

The U1 and U2 parts are LSB MUL (low order multiplier) with input/output bit width of nn→n describes.

The first adder U3 element is a full adder, which is responsible for calculating the difference between p3 and p2 (line 14 of the algorithm). Since the data in the computer are all represented by the complementary codes, the negative operation rule can be about inverting and adding 1, and therefore, p3-p2 can be about inverting and adding 1 and p3, namely, the first adder (U3) is used for inverting the output result of the second multiplier (U1) and then executing addition operation with carry 1.

The second adder U4 and the diplexer U5 implement line 15 of the fused modulo multiplier algorithm. U4 is responsible for calculating the difference between t and m, and U5 is a two-way multiplexer that compares the magnitude relationship of t and m, and outputs t-m if t > =m, and otherwise outputs t.

Example two

The second embodiment of the present invention provides an FPGA circuit, that is, a digital integrated circuit, which can change the internal structure of the chip by programming, and the FPGA circuit includes any one of the modulo multiplier circuits provided in the first embodiment of the present invention, where the circuit unit for performing the modulo multiplication operation in the FPGA circuit employs the modulo multiplier circuit of the present invention, and the multiplier in the modulo multiplier circuit is implemented by a DSP.

With the FPGA circuit provided by this embodiment, the calculation performance of the FPGA circuit can be improved by about 67% with the number of DSPs fixed, due to the reduced number of multipliers required for the modulo multiplier circuit.

Example III

An ASIC module, that is, an ASIC module, is provided in a third embodiment of the present invention, including a Barrett circuit, where a Barrett circuit includes any one of the modulo multiplier circuits provided in the first embodiment of the present invention, and a circuit unit for performing a modulo multiplication operation in the Barrett circuit employs the modulo multiplier circuit of the present invention.

With the ASIC module provided in this embodiment, more Barrett circuits can be instantiated on the chip with limited chip area or power consumption, thereby improving throughput by a factor of 3. Alternatively, in the case where the throughput index is satisfied, the chip area is reduced by about 66%, thereby reducing the chip cost and power consumption.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A modulo multiplier circuit, comprising: a first multiplier (U0), a second multiplier (U1), a third multiplier (U2), a first adder (U3), a second adder (U4) and a two-way multiplexer (U5), wherein:

the first input end of the first multiplier (U0) receivesThe second input of the first multiplier (U0) receivesThe output of the first multiplier (U0) is +.>And->Is p1, wherein ∈>And->Is the operand of the modular multiplication, m is the module used for the modular multiplication, +.>，/>Is constant, n is->Andis a length of (2);

the first input end of the second multiplier (U1) is connected with the output end of the first multiplier (U0), the second input end of the second multiplier (U1) receives m, and the output end of the second multiplier (U1) outputs a product p2 of p1 and m;

the first input of the third multiplier (U2) receivesThe second input of the third multiplier (U2) receives +.>The output of the third multiplier (U2) outputs +.>And->The product p3;

a first input end of the first adder (U3) is connected with an output end of the second multiplier (U1), a second input end of the first adder (U3) is connected with an output end of the third multiplier (U2), and an output end of the first adder (U3) outputs a difference value t between p2 and p3;

the first input end of the second adder (U4) is connected with the output end of the first adder (U3), the second input end of the second adder (U4) receives m, and the output end of the second adder (U4) outputs a difference t-m between t and m;

the first input end of the two-way multiplexer (U5) is connected with the output end of the first adder (U3), the second input end of the two-way multiplexer (U5) is connected with the output end of the second adder (U4), and the output end of the two-way multiplexer (U5) selects a result required to be output according to the size relation of t and m, wherein when t > =m, the two-way multiplexer (U5) outputs t-m, and when t < m, the two-way multiplexer (U5) outputs t.

2. A modular multiplier circuit according to claim 1, characterized in that the first multiplier (U0) outputs the highest n bits of the product p1, the second multiplier (U1) outputs the lowest n bits of the product p2, and the third multiplier (U2) outputs the lowest n bits of the product p 3.

3. A modulo multiplier circuit according to claim 1, wherein said first adder (U3) is arranged to perform an addition operation with carry 1 after inverting the output of said second multiplier (U1).

4. The modular multiplier circuit of claim 1, further comprising: a pre-calculation unit for calculatingWherein a second input of the first multiplier (U0) is adapted to be connected to an output of the pre-calculation unit.

5. An FPGA circuit comprising a modulo multiplier circuit according to any of claims 1 to 4, wherein the multipliers in the modulo multiplier circuit are implemented by a DSP.

6. An ASIC module comprising a Barrett circuit comprising a modulo multiplier circuit according to any of claims 1 to 4.