CN117313173A

CN117313173A - Modular multiplication operation method, modular multiplication module and homomorphic processing unit

Info

Publication number: CN117313173A
Application number: CN202311175014.2A
Authority: CN
Inventors: 周朕; 宋捷
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-12-29

Abstract

The embodiment of the specification provides a modular multiplication operation method and a modular multiplication module. The modular multiplication module is suitable for the modulus with the highest bit width of n, and comprises a mapping unit formed based on a hardware lookup table LUT, wherein a selection path in an input path of the mapping unit supports m selection signals, and m is n/2+1. The modular multiplication method comprises the following steps: the value s is selected from m possible values according to the bit width k of the modulus q. A first bit string corresponding to the first product of multipliers a and b is calculated. The first bit string is input to the mapping unit and its selection path is set to the path corresponding to the enable s, resulting in a second bit string with the first bit string shifted right by s bits. A third bit string corresponding to the product of the second bit string and the pre-calculated target value is calculated. Right shifting the third bit string by t digits through wiring connection to obtain a fourth bit string; wherein t is a constant value. From the fourth bit string, a modulo result of the first product with respect to the modulus q is determined.

Description

Modular multiplication operation method, modular multiplication module and homomorphic processing unit

Technical Field

One or more embodiments of the present specification relate to an integrated circuit implemented cryptographic processing unit, and more particularly, to a modular multiplication method and modular multiplication module therein.

Background

With the increasing awareness of data security and privacy protection, privacy computing is becoming one of the mainstream methods of data processing and analysis as a new type of secure computing mode. The privacy calculation directly calculates the encrypted or anonymous data without exposing the original data, thereby realizing the protection of data privacy and security and having wide application prospect. At present, the privacy computing technology is widely applied to the fields of artificial intelligence, finance, medical treatment and the like, and becomes an important supporting technology of a digital society in the future. Homomorphic encryption is one of the common mainstream technologies in the field of privacy computing, and can perform various common computing operations in an encryption state, and meanwhile, the correctness of results and the privacy of data are ensured. Homomorphic encryption technology is widely applied to cloud computing, data privacy protection, secure multiparty computing and other scenes, and is one of important technical means for protecting private data and realizing secure computing.

In order to accelerate the performance of homomorphic encryption operations, it is proposed to develop a homomorphic processing unit HPU (Homomorphic Processing Unit) chip dedicated to performing the computing operations in homomorphic encryption.

The common homomorphic encryption scheme at present takes a lattice-based cryptosystem as a main principle, the mathematical principle behind the scheme is mainly the fault-tolerant learning problem on a polynomial ring, the main operation types are addition, subtraction, multiplication and other operations in a modular sense, wherein the implementation efficiency of modular multiplication is one of important factors for determining the final performance of the whole homomorphic encryption scheme, and is also an important factor for influencing the performance of an HPU chip.

Disclosure of Invention

One or more embodiments of the present disclosure describe a modular multiplication method, a modular multiplication module, and an HPU, which can reduce hardware consumption and improve performance.

According to a first aspect, there is provided a method of modular multiplication performed by a modular multiplication module adapted to a module having a highest bit width n, the modular multiplication module comprising first and second multipliers, a set of cells of a mapping cell formed based on a hardware look-up table LUT, and preset wiring connections, an input path of the mapping cell in the set of cells comprising a data path and a selection path, the selection path supporting m selection signals, m being n/2+1; the method comprises the following steps:

according to the actual bit width k of the current modulus q, selecting a first value s from m possible values; wherein k < = n;

calculating a first bit string corresponding to a first product of the first multiplier a and the second multiplier b by using the first multiplier;

inputting the first bit string into a data path of the mapping unit, and setting a selection path of the first bit string as a path corresponding to the first value s, so as to obtain a second bit string which moves the first bit string right by a first value s bit number;

calculating a third bit string corresponding to a second product of the second bit string and the pre-calculated target value using a second multiplier;

right shifting the third bit string by a second valued t digits through the wiring connection to obtain a fourth bit string; wherein the second value t is a fixed value;

from the fourth bit string, a modulo result of the first product with respect to the current modulus q is determined.

In one specific implementation, determining the first value s specifically includes:

if the actual bit width k < =n/2, determining a first value s=0;

if the actual bit width k > n/2, a first value s=k-2 is determined.

In one embodiment, the cell group includes 2n mapping cells, each mapping cell is implemented based on a hardware LUT, and its input paths each include the data path and the select path; a mapping unit is used for outputting one bit in the second bit string.

In a more specific embodiment, a single mapping unit comprises a plurality of stages of hardware LUTs.

In one example, the second value t=n+3.

According to one embodiment, determining the modulo result of the first product with respect to the current modulus q based on the fourth bit string specifically includes: calculating a third product of a fourth bit string and the current modulus q using a third multiplier; calculating the difference between the first product and the third product by using an adder to obtain a first difference value; and comparing the first difference value with the current modulus q to obtain the modulus taking result.

In various embodiments, the first multiplier a is less than n in bit width and the second multiplier b is 1; alternatively, the first multiplier a and the second multiplier b are each smaller than the current modulus q.

In one embodiment, the number of selection paths of the mapping unit is log ₂ m。

According to a second aspect, there is provided a modular multiplication module for performing a modular multiplication operation on a modulus having a highest bit width n, the modular multiplication module comprising: the input paths of the mapping units in the unit groups comprise data paths and selection paths, wherein the selection paths support m types of selection signals, and m is n/2+1; wherein, in the process of carrying out modular multiplication operation:

the first multiplier is configured to calculate a first bit string corresponding to a first product of the first multiplier a and the second multiplier b;

the data path of the mapping unit receives the first bit string, and the selection path is set to enable the path corresponding to the first value s to obtain a second bit string which moves the first bit string to the right by the first value s bit number; the first value s is selected from m possible values according to the actual bit width k of the current modulus q; wherein k < = n;

the second multiplier is configured to calculate a third bit string corresponding to a second product of the second bit string and a pre-calculated target value;

the wiring connection is used for right shifting the third bit string by a second valued t digits to obtain a fourth bit string; wherein the second value t is a fixed value; the fourth bit string is used to determine the modulo result of the first product relative to the current modulus q.

According to a third aspect, there is provided a homomorphic processing unit HPU chip comprising the modular multiplication module of the second aspect.

According to a fourth aspect, there is provided a computing device comprising the HPU chip of the third aspect.

In the embodiments of the present specification, an improved modular multiplication operation method and modular multiplication module are proposed in which the hardware design of two shift operations is improved by resetting the value of the shift parameter. Specifically, the first shift reduces the select path of the hardware look-up table LUT, and the second shift replaces the LUT solution by a direct wire. The whole scheme consumes less hardware resources on the basis of ensuring the calculation accuracy, has a simpler hardware structure and improves the performance.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a schematic diagram of a data shift implemented by a mapping unit;

FIG. 2 illustrates a method flow diagram for modular multiplication operations, according to one embodiment;

FIG. 3 illustrates a schematic diagram of a modular multiplication module, according to one embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

Homomorphic encryption is one of the common mainstream technologies in the field of privacy computation, and can perform corresponding mathematical computation on data in a state that the data is encrypted. Homomorphic encryption algorithms can be further divided into semi-homomorphic encryption and homomorphic encryption, wherein semi-homomorphic encryption supports one homomorphic operation (e.g., homomorphic addition or homomorphic multiplication), and homomorphic encryption supports both addition and multiplication operations. Most homomorphic encryption operations are performed in modulo space, and therefore homomorphic encryption operations involve a large number of four-way operations in the modulo sense.

Specifically, the modulo operation is an operation of dividing two integers a and q to obtain a remainder r, (i.e., a≡q=d … … r, a=q×d+r, 0+.ltoreq.r < q). Where a is called the reduced number (or dividend), the integer q is called the modulus (or divisor), and the modulo operator is generally denoted by "mod", i.e. r=a mod q. For example, 10mod 3 = 1 because 10 = 3 x 3+1. The modulo operation is widely used in homomorphic encryption, and is a key for realizing high performance of homomorphic encryption scheme.

Based on the modulo operation, a remainder system RNS is proposed. In RNS, an integer may be represented as the remainder of a set of moduli, which are typically prime numbers, and whose product is greater than the largest number represented. For modulo operation of any two integers, their remainder can be operated on separately, with the end result being equivalent. The main advantage of RNS is that it is capable of performing modular operations with large modulus efficiently, and is very widely used in homomorphic encryption.

Regarding the calculation process of the modulo operation, the barrett reduction (Barrett reduction) algorithm is an efficient algorithm for implementing the modulo operation. The main difficulty with modulo arithmetic is that in r=amod q=a-q×d, the calculation of d involves divisionDivision is relatively costly to implement. The main algorithm idea of barrett is to use the calculation process of d from the originalReplaced by->Wherein s and t are two settable integer parameters +.>Representing a rounding down operation. Division to the power of 2 may be achieved by a shift operation. Thus, barrett's reduction is similar to division through multiplication and shift operationsSimilar effect, thereby promote the efficiency of modulo operation.

The modular multiplication operation is an operation commonly used in homomorphic encryption systems, that is, after multiplying two integers a and b, a modulus q is subjected to modulo operation, that is, a×b mod q. The modular multiplication operation may be implemented based on a conventional barrett reduction algorithm.

In the HPU, a modular multiplication operation is performed by a modular multiplication module implemented in hardware. The flow and principles of modular multiplication operations using barrett's reduction algorithm in the modular multiplication module are described below.

It is assumed that the maximum bit width supported by the modular multiplication module is n, that is, n is the upper bound of the bit width of the single modulus of the remainder system. And assume that the bit width of the current modulus q of the input is k (1<k n is satisfied). In the traditional modular multiplication operation, the modular multiplication operation numbers a and b are required to be smaller than the modulus q, namely 0-0 < q, and 0-0 < b < q are satisfied. The modular multiplication module will calculate the modular multiplication result of a and b with respect to q, outputting the integer c=a×b mod q.

The specific calculation process based on the modular multiplication module is as follows.

In the pre-operation stage, firstly, taking two parameter values s and t; then calculate the target value

Then performing modular multiplication operation:

(1) Calculating a first multiplication a=a×b;

(2) Shift the s bit to the right to obtain

(3) Calculating a second multiplication sftd=ah×u;

(4) Shift t bit to right to obtain

(5) Calculating a third multiplication appra=apprd×q;

(6) Calculate subtraction rem=a-apprA;

(7) And judging the size relation between rem and q, and obtaining c according to the size relation.

The above is a conventional barrett modular multiplication method, the main idea of barrett is to operate divisionSubstitution with multiplication and shift operations in steps (2) to (4)>Thereby saving the calculation cost. It can be seen that in the expression of the true quotient d and app rd, if all the rounding down is removed +.>The post d and app rd are exactly equal, but under rounding the values of d and app rd will have a certain difference δ, the difference level being approximately:

the range of the difference delta may be different depending on the choice of s and t. It can be seen from the logic of (5) to (6) that when δ is too large, a plurality of determinations are required in step (7), that is, sequentially: rem > =δq? Is rem > = (delta-1) q? Is …, rem > =q? . If the difference delta is very small, e.g., delta <1, then it is only necessary to determine if rem is greater than or equal to q. That is, step (7) is embodied as:

(7') if rem > = q, then c=rem-q; conversely, c=rem.

Therefore, in order to simplify the judgment, the control δ needs to be small enough. To control δ to be small enough, s=k-2, t=k+3, is typically chosen, where:therefore, in the conventional scheme, two shift parameter values are set as follows:

s＝k-2, t＝k+3 (2)

from the viewpoint of the hardware configuration of the modular multiplication module, in order to perform the above operation, the modular multiplication module generally includes a plurality of multipliers for performing the multiplication operations in steps (1), (3), (5), and a plurality of shift units for performing the shift operations in steps (2) and (4).

According to a typical arrangement, the modular multiplication module comprises at least 3 multipliers, M1, M2 and M3, each performing a multiplication of the above 3 steps. These multipliers may be implemented based on a DSP (digital signal processor).

It will be appreciated that the bit width setting required for each multiplier may be derived from the number of bits of the multiplier operated in the above execution steps. Specifically, the operational digital width of the multiplier M1 is k-bit×k-bit, the operational digital width of the multiplier M2 is (2 k-s) -bit× (s+t-k+1) -bit, and the operational digital width of the multiplier M3 is (k+1) -bit×k-bit. Under the setting of a hardware operator, all actual multiplication bit widths need to be the maximum value under all conditions, so that the bit widths of three multipliers M1, M2 and M3 of the actual hardware are respectively n-bit×n-bit, (n+2) -bit× (n+2) -bit, (n+1) -bit×n-bit.

In addition, the modular multiplication module requires a shift operation using a shift unit. In one implementation, the shifting is performed by a shifting circuit. However, in view of the delay problem in the circuit, in a more preferred solution, the data shift is implemented by forming the mapping unit using a hardware look-up table LUT.

Fig. 1 shows a schematic diagram of a data shift implemented by a mapping unit. As shown, assuming that the input value is represented as a v-bit string, the v-bit string needs to be shifted right by s bits. In the example of fig. 1, v=8. Then, for any one bit of the operation result, it can be considered as one bit selected from the bit values of v bits according to the specific value of s. For example, for bit number 5 in the operation result illustrated in fig. 1, if s=3, bit value 1 at number 2 is selected from the input bit string of 8 bits as a result; if s=4, then the bit value 0 at number 1 is selected from the 8-bit input bit string as a result.

The above selection logic may be embodied as a mapping implemented by a mapping unit. The input of the mapping unit comprises a data path and a selection path, the data path corresponding to the input bit string. Assuming that the right shift number s has m possible values, the select path may support m select signals. The output of the mapping unit is the bit value of a single bit in the shift result. The mapping unit is used for mapping the combination of the data input and the selection signal into a single-bit output.

In a simple implementation, the data paths of the mapping unit may have v input bit strings corresponding to v bits, respectively; the number of selection paths may be m, corresponding to m types of selection respectively. In operation, a v-bit string is input to the data path and a path enable (enab le) corresponding to the selected shift number s of the m select paths, for example, the s-th path is set to 1 and the others are all 0. In this way, the mapping unit implements the mapping from v+m to 1.

In practice, to reduce wiring and improve the channel utilization, the number of channels of the mapping unit can be reduced. In particular, since the shift number has m choices, the output result can only come from the data source of m bits, and thus the data path can be set to m pieces. In such a case, a selector may be provided for selecting the possible m-bit data source from all v bits before the data path input of the mapping unit. For select paths, log may be set ₂ m, each path having 2 states, log ₂ The state combination of m lanes may support m choices. For example, in the case where m=4, using 2 selection paths, a combination of (0, 0), (0, 1), (1, 0), (1, 1) is constituted, that is, 4 kinds of selection signals can be supported. In the above case, the mapping unit implements the mapping from m+log ₂ m to 1.

The above is a description of the mapping unit at the logical level. At the hardware level, the mapping unit may be implemented by a hardware look-up table LUT. LUTs are very basic hardware units in integrated circuits, typically with fixed specifications of their input and output. For example, a common basic LUT is a 4-way input, a 1-way output, and only a 4-to-1 mapping can be achieved. In case the mapping unit requires more paths of mapping, the mapping logic of the mapping unit may be implemented by multiple levels of LUTs. The process of embodying the mapping unit as a LUT is well known in the art given the mapping relation it requires. There are a variety of design tools that implement a design from mapping logic to LUTs, which are not developed herein.

The above mapping unit performs a mapping operation for any one bit in the output result. For an input bit string of v bits, it is generally assumed that the output bit string is also v bits (there is a case where the shift number s=0). Therefore, v mapping units shown in fig. 1 are required as a group to constitute a total mapping unit, and a process of mapping an input bit string into a shifted output bit string is implemented.

Returning to the modular multiplication process described above. In both step (2) and step (4) of performing modular multiplication operation using barrett reduction, a shift to the right is required, and the shift operation is required to be performed using the mapping unit described above.

In step (2), the input data is shifted right by s bits. As previously described, s is set to s=k-2 in order to control δ to be small enough. And k has a value of 1<And k is less than or equal to n, namely the value of k is selected from n-1, and the value of s is selected from n-1 correspondingly. Therefore, when the shift operation of step (2) is performed using the LUT-based mapping unit, the number of selection paths needs to support n-1 selection signals. I.e. the number of selection paths is at least log ₂ (n-1)。

In step (4), the input data is shifted right by t bits. As previously described, t is set to t=k+3. Similarly, the value of t has n-1 choices. Therefore, when the shift operation of step (4) is performed using the mapping unit, the number of selection paths also needs to support n-1 selection signals.

Therefore, according to the arrangement of s and t in the conventional modular multiplication unit (i.e., s=k-2, t=k+3), at least 2 sets of LUT-based mapping units are required to perform the shift operation, and the selection paths of the respective mapping units in each set of mapping units need to support n-1 selection signals, for which the hardware resources consumed for performing the path routing are more.

On the other hand, to ensure the establishment of the numbers less than or equal to formula (1), A <2≡2k is required to be ensured, for which a < q, b < q are generally required. In the actual homomorphic scheme, the modulo switching is involved, and the operations of c=a×1mod q and a > q need to be performed without introducing additional operators. The above conventional method only supports the scene of a < q and b < q, but cannot support such an operation.

In view of the shortcomings of the existing schemes, in the embodiments of the present specification, an improved scheme is provided, by setting the shift parameters s and t differently, hardware resources are saved, and the analog-to-digital switching scenario is better supported.

FIG. 2 illustrates a method flow diagram for modular multiplication operations, according to one embodiment. The method is performed using a modular multiplication module that is adapted to a modulus having a highest bit width n. FIG. 3 illustrates a schematic diagram of a modular multiplication module, according to one embodiment. As shown in fig. 3, the modular multiplication module includes a plurality of multipliers and an adder A1, where the plurality of multipliers specifically includes a first multiplier M1, a second multiplier M2, and a third multiplier M3. The modular multiplication module further comprises a mapping unit group 100 and a wiring connection 200, both for implementing the shift operation.

In the following, a procedure of performing a modular multiplication operation using a modular multiplication module in an embodiment of the modification will be described with reference to fig. 2 and 3.

As shown in fig. 2, first, in step S21, a first value S is selected from m possible values according to the actual bit width k of the current modulus q; where k < = n, m = n/2+1.

This step involves the configuration and setting of parameters. As a result of the study, the shift parameter can be set according to the following formula (3):

if k < = n/2, s=0; if k > n/2, s=k-2 (3)

It can be seen that in the case where k > n/2, k has n/2 possible values, and the corresponding s has n/2 possible values; in the case of k < = n/2, s has only a unique value of 0. Thus, overall, the first value s amounts to m possible values, where m=n/2+1. In step S21, a specific value of S is selected from m possible values according to the specific value of the actual bit width k.

Further, the second shift parameter t is set to a fixed value, which is referred to as a second value. Specifically, the second value t=n+3 is set.

After the first fetch is configuredBased on the value s and the second value t, the target value can be calculated in the pre-operation stage in a similar manner to the conventional operation

Then, in step S22, a first bit string corresponding to the first product a of the first multiplier a and the second multiplier b is calculated by the first multiplier. This step corresponds to step (1) in the conventional operation.

As can be seen from fig. 3, the first multiplier a and the second multiplier b are input to the first multiplier M1, and the multiplier M1 performs an operation to output the first bit string L1.

Unlike conventional, according to the present embodiment, the first multiplier a and the second multiplier b are required to satisfy a < q, b < q; or a < 2n, b=1. In the latter case, a may be greater than q.

Then, in step S23, a shift operation of shifting the first bit string by S bits right is performed using the mapping unit group. As described above, since the first value s has m possible values in total, the selection paths of the mapping units in the mapping unit group only need to support m=n/2+1 selection signals.

Specifically, in this step, the first bit string is input to the data path of each mapping unit, and the selection path of each mapping unit is set to the path corresponding to the first value s of enable (enab le), so as to obtain the second bit string L2 in which the first bit string L1 is shifted to the right by the first value s bits.

The data path of the mapping unit may be input with the first bit string L1, and may include a case where all bits in the first bit string are input into the data path, or may include a case where a part of bits (for example, m bits) selected by the selector in the first bit string are input into the data path. The principle of which has been described above in connection with fig. 1 is not repeated here.

Accordingly, in the schematic representation of fig. 3, the output of the first multiplier M1 is directly connected to the mapping unit 100, but this is for the sake of brevity and clarity only, and in fact, it is not excluded that other elements such as selectors are also connected therebetween.

It will be appreciated that since the maximum possible bit widths of a and b are both n, the maximum bit width of a is 2n. Thus, in one example, the set of mapping units 100 includes 2n mapping units, each mapping unit implemented based on a hardware LUT, the input paths of which each include a data path and a select path supporting m select signals; the single mapping unit is used to output one bit of the second bit string L2. In one specific example, a single mapping unit may include multiple levels of multiple hardware LUTs.

Further, in step S24, a third bit string corresponding to a second product of the second bit string and the pre-calculated target value is calculated using a second multiplier. This step corresponds to step (3) in normal operation, where the second product corresponds to sftd therein.

As can be seen from fig. 3, the second bit string L2 and the target value u are input to the second multiplier M2, and the multiplier M2 performs an operation to output the third bit string L3.

Then, in step S25, the third bit string is shifted to the right by a second value t digits through a preset wiring connection, so as to obtain a fourth bit string, where t is a constant value.

Specifically, t may be set to t=n+3. Since n is a constant value, t is also a constant value.

Since t is a fixed value, independent of the current modulus q, the shift operation of this step is a fixed mapping logic and does not require selection. In this way, the shift operation can be directly implemented by a simple and fixed wiring connection. For example, the wiring connection may be such that the ith bit of the input bit string is connected to the ith-t bits of the output bit string.

Accordingly, in fig. 3, the third bit string L3 is directly connected to the fourth bit string L4 shifted to the right by t bits through the wiring connection 200. The wiring connection 200 contains only direct wiring, no mapping unit is required, no LUT is required, and right shift can be achieved.

Next, in step S26, a modulo result of the first product with respect to the current modulus q is determined according to the fourth bit string.

Specifically, in this step, first, a third product of the fourth bit string L4 and the current modulus q is calculated using a third multiplier M3. This corresponds to step (5) in normal operation, where the third product corresponds to apprA. Then, the difference between the first product a and the third product apprA is calculated by the adder A1, resulting in a first difference rem. This corresponds to step (6) in the normal operation. It will be appreciated that by inverting the subtracted numbers, the subtraction can be converted to an addition, which is operated on by an adder. And then, according to the comparison of the first difference value rem and the current modulus q, obtaining a modulus taking result. Specifically, if rem > = q, then c=rem-q; conversely, c=rem. The components such as the judgment comparator are not shown in fig. 3 because they do not involve improvement points. The arrangement of the components in this section is achieved by conventional means.

Thus, based on the barrett's reduction principle, a modular multiplication operation is implemented by using the modular multiplication module shown in fig. 3.

The feasibility and technical advantages of the embodiment are demonstrated below.

First, the multiplier does not need to be changed

As described above, in the conventional scheme, the bit widths of the three multipliers M1, M2 and M3 are n-bit×n-bit, (n+2) -bit× (n+2) -bit, (n+1) -bit×n-bit, respectively.

In the method of this embodiment, the operational bit width of the three multiplication operations is still k-bit by k-bit, (2 k-s) -bit× (s+t-k+1) -bit, (k+1) -bit by k-bit, respectively. The first multiplier M1 and the third multiplier M3 are still only dependent on k compared to the case of the conventional algorithm, so that the maximum situation is obviously unchanged.

And for the second multiplier M2, its bit width is related to s and t, which need to be analyzed separately. In this embodiment, t=n+3 when k.ltoreq.n/2, s=0, so there is 2k_s=2k < n+2, s+t-k+1.ltoreq.n+2. When k > n/2, s=k-2, so there is 2k_s=k+2.ltoreq.n+2, s+t-k+1=n+2. Thus, in any case, the existing bit width setting (n+2) -bit× (n+2) -bit of the second multiplier M2 is sufficient to cover the case of s, t setting in this embodiment.

(II) ensure that the difference delta is small enough

If k < = n/2, s=0, t=n+3.

Case a: a is more than or equal to 0 and less than q, b is more than or equal to 0 and less than q

For case a, there are:

case b: a is more than or equal to 0 and less than 2 ⁿ ,b＝1

For case b, there are:

if k > n/2, s=k-2, t=n+3.

For case a, there are:

for case b, there are:

from the above, under various values of k and under various conditions of a and b, the difference value can be ensured to be smaller than 1, so that single judgment based on rem and q can be ensured, and the modular multiplication result can be obtained.

(III) greatly saving LUT resource consumption

As previously described, according to the arrangement of s and t in the conventional modular multiplication unit (i.e., s=k-2, t=k+3), at least 2 sets of LUT-based mapping units are required to perform the shift operation, and the selection paths of the respective mapping units in each set of mapping units are required to support n-1 selection signals.

In the modular multiplication module shown in fig. 3, however, a set of mapping units need to be used only at the first shift operation, and the selection paths of the set of mapping units need only support m=n/2-1 selection signals. The second shift operation is done by a direct connection, requiring no LUT at all.

Thus, the modular multiplication module shown in fig. 3 can save about 75% of the hardware resources required by the LUT with equal parameter range selections, relative to conventional schemes.

(IV) better support of analog-to-digital handoff functions

In the conventional scheme, to ensure the establishment of the number less than or equal to that in the formula (1), A <2≡2k is required, for this reason, a < q, b < q are required in practice. So that the case of a > q is not supported, that is, the operation of c=a×1mod q and a > q cannot be performed in the analog-to-digital switching. Thus, in the case of the analog-to-digital switching, the original analog-to-digital must not exceed the square of the new analog-to-digital, and the parameter support range is limited.

In the scheme of the embodiment, as can be seen from the formula (5) and the formula (7), the value of 0.ltoreq.a <2 based on the setting of t ⁿ In case of b=1, the correctness of the algorithm can still be ensured, thus supporting a>q, b=1. Therefore, the modular multiplication module can completely support the modular switching function of any module with the bit width of n or less, and an independent modular switching operator module is not required to be additionally designed, so that the hardware architecture is simpler.

Through the analysis, the modular multiplication module of the improved scheme consumes less hardware resources on the basis of ensuring the calculation accuracy and has a simpler hardware structure; and compared with a multi-stage LUT, the calculation performance is improved by performing shift through direct connection.

On the basis, according to another embodiment, there is also provided a homomorphic processing unit HPU chip, including the modular multiplication module described above.

Further, a computing device is provided, which includes the HPU chip.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims

1. A modular multiplication method, which is executed by a modular multiplication module, wherein the modular multiplication module is suitable for a module with the highest bit width of n, the modular multiplication module comprises a first multiplier, a second multiplier, a unit group of mapping units formed based on a hardware lookup table LUT, and preset wiring connection, an input path of the mapping units in the unit group comprises a data path and a selection path, the selection path supports m selection signals, and m is n/2+1; the method comprises the following steps:

2. The method of claim 1, wherein determining a first value s from m possible values based on an actual bit width k of a current modulus q comprises:

if the actual bit width k < =n/2, determining a first value s=0;

if the actual bit width k > n/2, a first value s=k-2 is determined.

3. The method of claim 1, wherein the set of cells comprises 2n mapping cells, each mapping cell implemented based on a hardware LUT, the input paths of which each comprise the data path and the select path; a mapping unit is used for outputting one bit in the second bit string.

4. A method according to claim 3, wherein a single mapping unit comprises a plurality of stages of a plurality of hardware LUTs.

5. The method of claim 1, wherein the second value t = n +3.

6. The method of claim 1, wherein determining the modulo result of the first product relative to the current modulus q based on the fourth bit string comprises:

calculating a third product of a fourth bit string and the current modulus q using a third multiplier;

calculating the difference between the first product and the third product by using an adder to obtain a first difference value;

and comparing the first difference value with the current modulus q to obtain the modulus taking result.

7. The method of claim 1 wherein the first multiplier a is less than n in bit width and the second multiplier b is 1; alternatively, the first multiplier a and the second multiplier b are each less than the current modulus q.

8. The method of claim 1, wherein the number of selection paths of the mapping unit is log ₂ m。

9. A modular multiplication module for performing modular multiplication operations on a modulus having a highest bit width n, the modular multiplication module comprising: the input paths of the mapping units in the unit groups comprise data paths and selection paths, wherein the selection paths support m types of selection signals, and m is n/2+1; wherein, in the process of carrying out modular multiplication operation:

10. The modular multiplication module of claim 9, wherein the first value s satisfies:

when the actual bit width k < = n/2, the first value s=0;

when the actual bit width k > n/2, the first value s=k-2.

11. The modular multiplication module of claim 9, wherein the set of cells comprises 2n mapping cells, each mapping cell implemented based on a hardware LUT, the input paths of which each comprise the data path and the select path; a mapping unit is used for outputting one bit in the second bit string.

12. The modular multiplication module of claim 11, wherein a single mapping unit comprises a plurality of stages of a plurality of hardware LUTs.

13. The modular multiplication module of claim 9, wherein the second value t = n +3.

14. The modular multiplication module of claim 9, wherein the number of selection paths of the mapping unit is log ₂ m。

15. A homomorphic processing unit HPU chip comprising a modular multiplication module as claimed in any one of claims 9 to 14.

16. A computing device comprising the homomorphic processing unit HPU chip of claim 15.