CN115587274A - Polynomial multiplication accelerating method and device - Google Patents
Polynomial multiplication accelerating method and device Download PDFInfo
- Publication number
- CN115587274A CN115587274A CN202211245657.5A CN202211245657A CN115587274A CN 115587274 A CN115587274 A CN 115587274A CN 202211245657 A CN202211245657 A CN 202211245657A CN 115587274 A CN115587274 A CN 115587274A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- input
- post
- multiplication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000010354 integration Effects 0.000 claims abstract description 33
- 238000012805 post-processing Methods 0.000 claims abstract description 28
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 238000003491 array Methods 0.000 claims abstract description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 70
- 238000012545 processing Methods 0.000 claims description 27
- 238000007792 addition Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 235000019800 disodium phosphate Nutrition 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 3
- LUTSRLYCMSCGCS-BWOMAWGNSA-N [(3s,8r,9s,10r,13s)-10,13-dimethyl-17-oxo-1,2,3,4,7,8,9,11,12,16-decahydrocyclopenta[a]phenanthren-3-yl] acetate Chemical compound C([C@@H]12)C[C@]3(C)C(=O)CC=C3[C@@H]1CC=C1[C@]2(C)CC[C@H](OC(=O)C)C1 LUTSRLYCMSCGCS-BWOMAWGNSA-N 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Algebra (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a method and a device for accelerating polynomial multiplication, wherein the device comprises m preprocessing external chunks, an input sorting module, k preprocessing internal chunks, a group of central multiplier arrays, k post-processing internal chunks, an output integration module and m post-processing external chunks
Description
Technical Field
The invention relates to a method and a device for accelerating polynomial multiplication.
Background
In the fields of digital signal processing, cryptography, coding theory and the like, the problem of how to quickly perform multiplication operation on two polynomials is often encountered, and the cycle number, total delay and resource consumption of the polynomials are important factors for determining the overall hardware architecture surface-to-efficiency ratio in the application scene, so that people put forward a plurality of achievable optimization methods for the polynomials.
The Karatsuba algorithm since 1962(reference: karatsuba, anatolii)&The Multiplication of polynomial Numbers on automation, soviet Physics Doklady.7.595) was proposed as one of the best ways to reduce the complexity of polynomial Multiplication over several decades. It can make the multiplication complexity in N-term polynomial multiplication be reduced byDown toAddition complexity of not more thanHowever, in practical applications, polynomial multiplication operations with large polynomial coefficients bit width are sometimes encountered, for example, in the study of elliptic curves, such a problem may be encountered in the modular multiplication operations in the galois field, and usually, a conventional multiplier is used as a central multiplier, or a multiplier ip provided in the FPGA is used as a central multiplier. However, when the bit width of the polynomial coefficient reaches tens of bits or hundreds of bits, the functional range of the multiplier ip may be exceeded, and the conventional multiplier design may cause problems of too high operation complexity, too large hardware area, and the like, so that the polynomial multiplier in this case may adversely affect the performance of the whole hardware implementation.
There are many implementations of polynomial multiplication and integer multiplication based on the kartsuba algorithm. For two binomial polynomials a (x) = a 0 +a 1 x and B (x) = B 0 +b 1 x, the classical multiplication algorithm is:
C(x)=a 0 b 0 +(a 0 b 1 +a 0 b 1 )x+a 1 b 1 x 2
the algorithm requires four multiplications and one addition. And a binomial polynomial multiplication algorithm KA based on the Karatsuba algorithm 2 Comprises the following steps:
C(x)=a 0 b 0 +((a 0 +a 1 )(b 0 +b 1 )-a 0 b 0 -a 1 b 1 )x+a 1 b 1 x 2
the algorithm requires three multiplications and four additions. On the premise that the delay and resource consumption of multiplication are far higher than those of addition operation, the complexity of the binomial multiplication is reduced to a certain extent by the algorithm. Based on Karatsuba binomial multiplication, a recursive term of 2 can be obtained n The Karatsuba algorithm of (1), which can be used for two 2 s n The polynomial is used for fast multiplication, and the specific algorithm is shown as algorithm I, whereinThe first algorithm is as follows: recursive Karatsuba2 n Polynomial multiplication algorithm
Calculated, the multiplication complexity of the algorithm is 4 of that of the traditional algorithm n Is reduced to 3 n The addition complexity is not more than 2.3 n +1 -2 n+3 +2. Except for 2 n Besides the Karatsuba polynomial multiplication of terms, there are also the Karatsuba algorithms of terms 3, 5, 7, and then the Karatsuba polynomial multiplication of arbitrary integer terms is also formed by using a method similar to the recursive algorithm described above. It is also demonstrated in the references "Weimers kirch, andre and Christof Paar." genetics of the Karatsuba Algorithm for influence implementations. "IACR Cryptol. EPrint Arch.2006 (2006): 224" that for any positive integer N, the ratio of the hardware area of Karatsuba polynomial multiplication to that of conventional polynomial multiplication is not less than
Disclosure of Invention
The invention aims to: the technical problem to be solved by the present invention is to provide a method and an apparatus for accelerating polynomial multiplication, and particularly to a method and an apparatus for accelerating polynomial multiplication based on Karatsuba architecture, wherein the method comprises:
two sets of polynomial coefficients are input, and the number of each set of polynomial coefficients isWherein p is 1 、p 2 、……、p m The number of prime factors is 1, 2, \8230, 8230and m repeatable prime factors;
two sets of polynomial coefficients are expressed as termsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of externally preprocessed data;
performing position taking sorting and reordering on the two groups of externally preprocessed data respectively to obtain sorted data;
the sorted data is counted according to itemsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is -1 、p -2 、……、p -k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements; the usage requirement is determined according to the area (resource) size of a multiplier which can be accepted by a user, for example, to calculate 4-term 64-bit polynomial multiplication, 256 DSPs are used for a traditional multiplier, 144 DSPs are used for a traditional karatsua, and 108 DSPs are used if p-1 is set to be =2, and 81 DSPs are used if p-1 is set to be = p-2= 2;
performing multiplication operation on corresponding data in the two groups of internally preprocessed data to obtain a group of preliminary product data;
the preliminary product data is counted as itemsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream, and internal post-processing data is obtained;
reordering, shifting and adding the internal post-processed data to obtain integrated data;
the integrated data is counted into itemsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream to obtain final output data, namely, the term number isThe polynomial coefficient of the product of.
The invention also provides an accelerating device for polynomial multiplication, which comprises m preprocessing external chunks, an input sorting module, k preprocessing internal chunks, a group of central multiplier arrays, k post-processing internal chunks, an output integration module and m post-processing external chunks, wherein m and k are positive integers.
The m preprocessing external chunks are used for inputting two groups of polynomial coefficients, and the number of terms of each group of polynomial coefficients isWherein p is 1 、p 2 、……、p m The 1 st, 2 nd, 8230, m repeatable prime factors of the number of the items; then two groups of polynomial coefficients are calculated according to terms ofThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of external preprocessed data;
the input sorting module is used for respectively carrying out position taking sorting and reordering on the two groups of externally preprocessed data to obtain sorted data;
the k preprocessing internal chunks are used for sorting the data according to the number of itemsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is -1 、p -2 、……、p -k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements;
the group of central multiplier arrays are used for multiplying corresponding data in the two groups of internal preprocessed data to obtain a group of preliminary product data;
the k post-processing internal chunks are used for setting the preliminary product data into the number of termsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream, and internal post-processing data is obtained;
the output integration module is used for performing reordering, shift operation and addition operation on the internally post-processed data to obtain integrated data;
the m post-processing external chunks are used for integrating the data according to the number of itemsThe Karatsuba algorithm performs operation processing on all operation rules after the multiplication operation of the data stream is completed to obtain final output data, namely the term number isThe polynomial coefficient of the product of.
whereinRespectively representing the number of items as p 1 、p 2 、……、p m The Karatsuba algorithm module of (1); references "Weimers kirch, andr and Christof paar." genetics of the Karatsuba Algorithm for efficacy Immunitions, "IACR Cryptol. EPrint Arch.2006 (2006): 224" and "Montgomery, peter L." Five, six, and seven-term Karatsuba-like for project. "IEEE Transactions on Computers 54 (2005): 362-369";
whereinRespectively represent the k prime factors p based on the 1 st, 2 nd, 8230; \8230;, k prime factors p specified according to the use requirement -1 、p -2 、……、p -k Corresponding number of items is p -1 、p -2 、……、p -k The Karatsuba algorithm module of (1); references "Weimers kirch, andre and Christof Paar." genetics of the Karatsuba Algorithm for efficiency innovations. "IACR Cryptol. EPrint Arch.2006 (2006): 224" and "Montgomery, peter L." Five, six, and seven-term Karatsuba like for purposes of "IEEE Transactions on Computers 54 (2005): 362-369";
wherein the KA _ pre block represents a hardware device that performs all operations that the data stream undergoes before going from the input to all multiplication operations in the Karatsuba algorithm;
where the KA _ post block represents the hardware device that performs all operations that the data stream undergoes after all multiplication operations to the output in the kartsuba algorithm.
The central multiplier array comprises a plurality of integer multipliers, wherein the number of the multipliers is equal to that of the central multiplier array Andis determined by the structure ofAnd the corresponding central multipliers are respectively l 1 ,l 2 ,...,l m And l -1 ,l -2 ,...,l -k Then, the number of central multipliers isAnd (4) respectively.
The input sorting module is configured to perform an input sorting algorithm as follows:
wherein a _ i 0 ,a_i 1 ,...,The first number representing the input sorting module isThe input data of (1) and (2) in (8230) \ 8230; and (8230); and,Input binary integer data, b _ i 0 ,b_i 1 ,...,Two sets of numbers representing input sorting modules areThe input data of (1), 2, 8230; a,Input binary integer data;
a_o 00 ,a_o 01 ,...,1 st, 2 nd, 8230in a first subgroup in a first set of output data representing input sort modules 8230,Binary integer data, a _ o 10 ,a_o 11 ,...,1 st, 2 nd, 8230in a second subgroup in a first set of output data representing input sort modules 8230,Number of binary integersAccording to the formula of \8230;,representing the first of the output data input to the sorting module 1, 2, \ 8230; \ 8230;, in the respective subgroup,Binary integer data;
b_o 00 ,b_o 01 ,...,1 st, 2 nd, 8230in a first subgroup in a second set of output data representing input sort modules 8230,A binary integer data, b _ o 10 ,b_o 11 ,...,1 st, 2 nd, 8230in a second subgroup in a second set of output data representing input sort modules 8230,A binary integer data of \8230;,second of the second set of output data representing input sorting modules 1, 2, \ 8230; \ 8230;, in the respective subgroup,Binary integer data.
The output integration module is used for executing the following output integration algorithm:
wherein c _ i 00 ,c_i 01 ,…,The first input data group of the output integration module has 1 st, 2 nd, 8230, 8230,binary integer data, c _ i 10 ,c_i 11 ,…,The 1 st, 2 nd, 8230th, and the like in the second group of input data of the output integration module are shown,Binary integer data, \ 8230 \ 8230;,to represent output integration Module Group input data 1, 2, 8230, 8230,Binary integer data;
wherein c _ o 0 ,c_o 1 ,…,1 st, in the output data representing the output integration module 2, 823060, 8230,Binary positive integer data.
The input sorting module comprises a sorting module and an input reordering module;
the sorting module will be two groupsThe low to high 0 th to t-1 th bits, t to 2t-1 th bits, \ 8230 \ 8230;, the second bit of each number in the binary integer dataTo the firstThe bits are respectively taken out and combined into a new integer, wherein t is an integer set according to the use requirement and is obtained from each initial dataNew integers are divided into a group to formA new array;
the input reordering module is toA new array of middle frontAll the 1 st, 2 nd, 8230of the array, 8230,Taking out data and splicing to new No. 1, 2, \8230;, B,Number of data isAnd will beAfter a new arrayAll the 1 st, 2 nd, 8230of the array, 8230,Data fetch and spliceNumber of data isAn array of (2).
The output integration module comprises an output reordering module, a shift module array and an addition array;
the output reordering module is toNumber of data isThe 1 st, 2 nd, 8230of the above-mentioned groups, 8230,Taking out the data and splicing them into new 1 st, 2 nd, 8230, 8230,Number of data isThe array of (2);
the shift module array is used for reordering the 1 st, 2 nd, 8230, the,Each data is padded with zero at high order and then left shifted by 0, t, \ 8230; \8230;, n, n in binary by shift register,The bit gets new data;
the addition array shifts all of the data in each arrayAdding the data by an adder to obtain a sum, wherein all the arrays are obtained togetherAnd a step of summing the sums, and outputting the resultant sum as output data of the addition array.
The invention adds a group of input sorting modules and output integration modules in the Karatsuba polynomial multiplication architecture, so that the Karatsuba polynomial multiplication architecture can be extended inwards and outwards in a bidirectional way, and provides a low-complexity low-resource high-bit-width polynomial multiplication method and device based on the Karatsuba architecture. The part outside the input sorting module and the output integration module comprises a Karatsuba preprocessing external block and a Karatsuba post-processing external block which are used for realizing the functions to be realized by the polynomial multiplication operation. The part between the input sorting module and the output integration module comprises a Karatsuba preprocessing internal block, a central multiplier array and a Karatsuba post-processing internal block, the original structure of the Karatsuba is longitudinally extended internally, and the Karatsuba post-processing internal block is further optimized on the basis of realizing functions.
Furthermore, the invention also provides a key exchange acceleration method, and polynomial multiplication operations in the CSIDH key exchange process are all realized by the acceleration method of polynomial multiplication, wherein the number of multipliers is N, and N is the term number of the polynomial involved in the CSIDH key exchange process.
Correspondingly, the invention also provides a key exchange accelerating device, which comprises the accelerating device for polynomial multiplication.
Has the advantages that: the method and the device of the invention realize the further simplification of the high-bit-width polynomial multiplier, so that the N-term polynomial is multipliedThe multiplication complexity of the method operation is further reduced, and the ratio of the hardware area to the traditional polynomial multiplication algorithm can be smaller than
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic diagram of a hardware architecture for Karatsuba polynomial multiplication.
FIG. 2 is a schematic diagram of the low complexity, low resource, high bit width polynomial Karatsuba multiplication architecture of the present invention.
Fig. 3 is a schematic circuit diagram of an input sorting module.
Fig. 4 is a circuit diagram of an output integration module.
Detailed Description
The invention provides a method and a device for accelerating polynomial multiplication, in particular to a method and a device for accelerating polynomial multiplication based on a Karatsuba architecture, wherein the method comprises the following steps:
two sets of polynomial coefficients are input, and the number of each set of polynomial coefficients isWherein p is 1 、p 2 、……、p m The number of prime factors is 1, 2, \8230, 8230and m repeatable prime factors;
two sets of polynomial coefficients are expressed as termsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of externally preprocessed data;
performing position taking sorting and reordering on the two groups of externally preprocessed data respectively to obtain sorted data;
the sorted data is counted according to itemsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is -1 、p -2 、……、p -k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements;
carrying out multiplication operation on corresponding data in the two groups of internal preprocessed data to obtain a group of preliminary product data;
the preliminary product data is counted as itemsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream, and internal post-processing data is obtained;
reordering, shifting and adding the internal post-processed data to obtain integrated data;
the integrated data is counted into itemsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream to obtain final output data, namely, the term number isThe polynomial coefficient of the product of.
The invention also provides an accelerating device for polynomial multiplication, which comprises m preprocessing external chunks, an input sorting module, k preprocessing internal chunks, a group of central multiplier arrays, k post-processing internal chunks, an output integration module and m post-processing external chunks, wherein m and k are positive integers.
The m preprocessing external chunks are used for inputting two groups of polynomial coefficients, and the number of terms of each group of polynomial coefficients isWherein p is 1 、p 2 、……、p m The 1 st, 2 nd, 8230, m repeatable prime factors of the number of the items; then two groups of polynomial coefficients are calculated according to terms ofThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of externally preprocessed data;
the input sorting module is used for respectively carrying out position taking sorting and reordering on the two groups of externally preprocessed data to obtain sorted data;
the k preprocessing internal chunks are used for sorting the sorted data according to the number of itemsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is -1 、p -2 、……、p -k Respectively 1, 2, 8230, k prime factors designated according to use requirements;
the group of central multiplier arrays are used for multiplying corresponding data in the two groups of internally preprocessed data to obtain a group of preliminary product data;
the k post-processing internal chunks are used for setting the preliminary product data into the number of termsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream, and internal post-processing data is obtained;
the output integration module is used for carrying out reordering, shift operation and addition operation on the data subjected to internal post-processing to obtain integrated data;
the m post-processing external chunks are used for integrating the data according to the number of itemsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream to obtain final output data, namely, the term number isThe polynomial coefficient of the product of.
The invention is based onThe Karatsuba structure of the terms is designed, wherein m is the order of the Karatsuba external structure, namely the term number of the integral structure input polynomial, k is the order of the Karatsuba internal structure, and p is i The minus sign in the subscript is used to distinguish it from other subscripts, and is also used to indicate that their corresponding KA _ pre and KA _ post functions are used in the intra-Karatsuba architecture. By being atA group of input sorting modules and output integration modules are added in the Karatsuba polynomial multiplication structure, so that the structure can be extended inwards and outwards in a bidirectional mode, and the modified Karatsuba polynomial multiplication structure with low complexity, low resources and high bit width is formed as shown in FIG. 2.
It can be seen that the overall architecture of fig. 2 is similar to that of fig. 1, but with some differences in detail. The two red dotted lines in fig. 2 are the input sorting module and the output integration module designed by the present invention, respectively. The blue modules except the red lines represent the external blocks in the framework, and the red lines are sequentiallyThe KA _ pre module and the KA _ post module; the yellow module within the two red lines represents the internal block in the structure, and the yellow module is arranged from the red line to the inside in sequenceThe KA _ pre module and the KA _ post module. The polynomial multiplication operation realized in the external architecture is the function realized by the whole architecture, and the internal architectureThen the original Karatsuba framework is longitudinally extended, and deeper optimization is performed on the basis of the external framework. In the middle of the array is a row of central multiplier arrays, the number of the multipliers isAndif the number of their corresponding central multipliers is l respectively 1 ,l 2 ,...,l m And l -1 ,l -2 ,...,l -k Then the number of central multipliers in fig. 2 isAnd (4) respectively. KA. The subscripts of KA _ pre and KA _ post represent the number of terms of this layer of kartsuba polynomial multiplication architecture.
The input sorting algorithm and the output integration algorithm are shown as algorithm two and algorithm three, and the input sorting module circuit schematic diagram and the output integration module circuit schematic diagram are shown as fig. 3 and 4. A new parameter t exists in the second algorithm and the third algorithm, and the requirement is metAnd is minimized as much as possible.
And (3) algorithm II: inputting a sorting algorithm:
and (3) algorithm III: and (3) outputting an integration algorithm:
represents an integerFrom jt-1 bit to (j-1) t bit of a slice in binary representation, the subscripts for numbers a _ i and b _ i have only one number, and the subscripts for a _ o and b _ o have two numbers, all for distinction only)
The subscript of the coefficient c _ o has only one number, and the subscript of c _ i has two numbers, both for distinction only. The subscripts for numbers a _ i and b _ i have only one number, and the subscripts for a _ o and b _ o have two numbers, all for distinction only.
Algorithm two and fig. 3 show an input sorting module comprising a set of functional blocks for bit-wise truncation of input data and a set of circuits for re-ordering and combining the output data sequence. Algorithm three and fig. 4 show that the output integration module includes a set of circuits for rearranging and combining the input data sequence, some shift module arrays, and a set of addition arrays (a row of trapezoidal block arrays in fig. 4). The input sorting module and the output integration module play two roles in the circuit: one is to perform conversion of the length of the coefficient vector byThe length of the input-output vector of the central multiplier of the structure of the term Karatsuba becomesLength of data vector transmitted between the mth layer pre-or post-treatment and the (m + 1) th layer pre-or post-treatment from outside to inside in the Karatsuba architecture; and secondly, the bit width of each numerical value in the transmission process is reduced, the number of terms is increased, the Karatsuba architecture can be conveniently extended in a bidirectional mode, and the architecture is further optimized.
In a 4-term (N =4, then according toTaking m =2,p 1 =p 2 =...=p m = 2) polynomial multiplication unit, for example, the polynomial coefficient width is set to 64. Then a multiplier unit operated by conventional polynomial multiplication, a multiplier unit operated by conventional Karatsuba polynomial multiplication, and a low-complexity, low-resource, high-bit-width polynomial multiplication unit (k is 2,t is 16,p) based on the Karatsuba architecture designed in the present scheme -1 =p -2 =...=p -k = 2) the resource/area ratio of the three in the FPGA is shown in table 1.
TABLE 1
In the embodiment, an EDA (electronic design automation) platform for simulation, integration and realization is vivado2021.1, and the selected FPGA model is Xilinx Virtex-7xc7vx690tffg1157-3. In the above data, # Slices and # DSP are both data obtained directly after synthesis and implementation, # SEC is data obtained by calculation that can represent hardware resource consumption or area, and the calculation formula is:
#SEC=#BRAMs×100+#DSPs×100+#Slices
where # BRAMs defaults to 0 since no BRAM is used in any of the three multipliers. Theoretically, the minimum limit of the ratio of the hardware area of the Karatsuba polynomial multiplication to the conventional polynomial multiplication algorithm isIn the above example this limit value isIt can be seen from table 1 that the conventional Karatsuba method is slightly above this limit, whereas the present solution is below this limit.
The embodiment also provides a CSIDH key exchange acceleration method, which includes: the polynomial multiplication operation in the CSIDH key exchange process is realized by the polynomial multiplication acceleration method.
Further, the number of multipliers is N, where N is the number of terms of the polynomial involved in the CSIDH key exchange process, and in an operation environment of 64-bit integers, N is 8 in the CSIDH key exchange process using the CSIDH512 parameter set, N is 16 in the CSIDH key exchange process using the CSIDH1024 parameter set, and N is 32 in the CSIDH key exchange process using the CSIDH2048 parameter set.
The CSIDH key exchange process can involve polynomial multiplication operation of multiple degrees, the polynomial multiplication operation of each degree is the same, and the number N of the multipliers is the number of terms of different polynomials corresponding to the CSIDH key exchange process with different parameters.
Correspondingly, the embodiment of the invention also provides a CSIDH encryption and decryption acceleration device, which comprises the acceleration device for polynomial multiplication.
The CSIDH key exchange acceleration method and apparatus provided in this embodiment can improve the efficiency of the CSIDH key exchange process on the basis of reducing the resource consumption of the FPGA hardware implementation of the CSIDH.
In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the method for accelerating polynomial multiplication and some or all of the steps in each embodiment provided in the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
It is clear to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a computer program or a software product, where the computer program or the software product may be stored in a storage medium and include instructions for enabling a device (which may be a personal computer, a server, a single chip microcomputer, an MUU, or a network device) including a data processing unit to execute the method according to the embodiments or some parts of the embodiments of the present invention.
The present invention provides a method and an apparatus for accelerating polynomial multiplication, and a plurality of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be considered as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (10)
1. A method for accelerating polynomial multiplication, comprising:
two sets of polynomial coefficients are input, the number of terms in each set of polynomial coefficients isWherein p is 1 、p 2 、……、p m The 1 st, 2 nd, 8230, m repeatable prime factors of the number of the items;
two sets of polynomial coefficients are calculated according to termsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of external preprocessed data;
performing position taking sorting and reordering on the two groups of externally preprocessed data respectively to obtain sorted data;
the sorted data is counted according to itemsThe Karatsuba algorithm is as followsAll operation rules of the data stream before reaching the multiplication operation are operated to obtain two groups of internal preprocessed data, wherein p -1 、p -2 、……、p -k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements;
performing multiplication operation on corresponding data in the two groups of internally preprocessed data to obtain a group of preliminary product data;
the preliminary product data is counted as itemsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream, and internal post-processing data is obtained;
reordering, shifting and adding the internal post-processed data to obtain integrated data;
2. An accelerating device for polynomial multiplication is characterized by comprising m preprocessing external chunks, an input sorting module, k preprocessing internal chunks, a group of central multiplier arrays, k post-processing internal chunks, an output integration module and m post-processing external chunks, wherein m and k are positive integers;
the m preprocessing external chunks are used for inputting two groups of polynomial coefficients, and the number of terms of each group of polynomial coefficients isWherein p is 1 、p 2 、……、p m The number of prime factors is 1, 2, \8230, 8230and m repeatable prime factors; then two groups of polynomial coefficients are calculated according to terms ofThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of externally preprocessed data;
the input sorting module is used for respectively carrying out position taking sorting and reordering on the two groups of externally preprocessed data to obtain sorted data;
the k preprocessing internal chunks are used for sorting the data according to the number of itemsThe Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is -1 、p -2 、……、p -k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements;
the group of central multiplier arrays are used for multiplying corresponding data in the two groups of internal preprocessed data to obtain a group of preliminary product data;
the k post-processing internal chunks are used for generating preliminary product data according to the number of itemsThe Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream, and internal post-processing data is obtained;
the output integration module is used for performing reordering, shift operation and addition operation on the internally post-processed data to obtain integrated data;
the m post-processing external chunks are used for integrating the data according to the number of itemsThe Karatsuba algorithm performs operation processing on all operation rules after the multiplication operation of the data stream is completed to obtain final output data, namely the term number isThe polynomial coefficient of the product of (c).
3. The apparatus of claim 2, wherein the m preprocessed external chunks are each The KA _ pre module;
whereinRespectively representing the number of items as p 1 、p 2 、……、p m The Karatsuba algorithm module of (1);
whereinRespectively representing the prime factors p of the k prime factors based on the 1 st, 2 nd, 8230, and p -1 、p -2 、……、p -k Corresponding number of items is p -1 、p -2 、……、p -k The Karatsuba algorithm module of (1);
wherein the KA _ pre block represents a hardware device that performs all operations that the data stream undergoes before going from the input to all multiplication operations in the Karatsuba algorithm;
where the KA _ post block represents the hardware device that performs all operations that the data stream undergoes after all multiplication operations to the output in the kartsuba algorithm.
4. The apparatus of claim 3 wherein the central multiplier array comprises a plurality of integer multipliers, wherein the number of multipliers is selected from the group consisting ofAndis determined by the structure ofAndthe corresponding central multipliers are respectively l 1 ,l 2 ,...,l m And l -1 ,l -2 ,...,l -k Then the number of central multipliers isAnd (4) respectively.
5. The apparatus of claim 4, wherein the input sorting module is configured to execute an input sorting algorithm that:
……
……
whereinA first group of numbers representing input sorting modules isThe input data of (1) and (2) in (8230) \ 8230; and (8230); and,The input binary integer data is inputted to the input,two sets of numbers representing input sorting modules areThe first group of input data includes 1 st, 2 nd, 8230, 8230,Input binary integer data;
1 st, 2 nd, 8230in a first subgroup in a first set of output data representing input sort modules 8230,A number of binary integer data of the number of binary integers,1 st, 2 nd, \8230; a,Binary integer data, \ 8230 \ 8230;,representing the first of the output data input to the sorting module1, 2, \ 8230; \ 8230;, in the respective subgroup,Binary integer data;
1 st, 2 nd, 8230in a first subgroup in a second set of output data representing input sort modules 8230,A number of binary integer data of the number of binary integer data,1 st, 2, \8230; a,A binary integer data of \8230;,second of a second set of output data representing input sorting modules1 st part of the subgroup 2, 823060, 8230,Binary integer data.
6. The apparatus of claim 5, wherein the output integration module is configured to perform an output integration algorithm that:
……
whereinThe first input data group of the output integration module has 1 st, 2 nd, 8230, 8230,a number of binary integer data of the number of binary integer data,1 st, 2 nd, 8230, (8230) in the second set of input data representing the output integration module,A binary integer data of \8230;,to represent output integration ModuleGroup input data 1, 2, \8230, 8230, 8230,binary integer data;
7. The apparatus of claim 6, wherein the input sorting module comprises a sorting module and an input reordering module;
the sorting module will be two groupsThe low to high 0 th to t-1 th bits, t to 2t-1 th bits, \ 8230 \ 8230;, the second bit of each number in the binary integer dataTo the firstThe bits are respectively taken out and combined into a new integer, wherein t is an integer set according to the use requirement, and the new integer is obtained from each initial dataNew integers are divided into a group to formA new array;
the input reordering module is toA new array of middle frontAll the 1 st, 2 nd, 8230of the array, 8230,Taking out the data and splicing them into new 1 st, 2 nd, 8230, 8230,Number of data isAnd will beAfter in a new arrayAll numbers 1, 2, \ 8230of the individual arrays…、Data fetch and spliceNumber of data isAn array of (2).
8. The apparatus of claim 7, wherein the output integration module comprises an output reordering module, a shift module array, an addition array;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211245657.5A CN115587274A (en) | 2022-10-12 | 2022-10-12 | Polynomial multiplication accelerating method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211245657.5A CN115587274A (en) | 2022-10-12 | 2022-10-12 | Polynomial multiplication accelerating method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115587274A true CN115587274A (en) | 2023-01-10 |
Family
ID=84780700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211245657.5A Pending CN115587274A (en) | 2022-10-12 | 2022-10-12 | Polynomial multiplication accelerating method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115587274A (en) |
-
2022
- 2022-10-12 CN CN202211245657.5A patent/CN115587274A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Okada et al. | Implementation of Elliptic Curve Cryptographic Coprocessor over GF (2 m) on an FPGA | |
Ding et al. | High-speed ECC processor over NIST prime fields applied with Toom–Cook multiplication | |
KR100308723B1 (en) | Round-Storage Adder Circuit and Multiple Binary Data Bit Sum Method | |
Fan et al. | Efficient hardware implementation of Fp-arithmetic for pairing-friendly curves | |
Guo et al. | Fast binary counters and compressors generated by sorting network | |
Tian et al. | Ultra-fast modular multiplication implementation for isogeny-based post-quantum cryptography | |
Xie et al. | Novel bit-parallel and digit-serial systolic finite field multipliers over $ GF (2^ m) $ based on reordered normal basis | |
Kakde et al. | Design of area and power aware reduced Complexity Wallace Tree multiplier | |
US6957243B2 (en) | Block-serial finite field multipliers | |
CN112799634B (en) | Based on base 2 2 MDC NTT structured high performance loop polynomial multiplier | |
Rani et al. | FPGA implementation of fast adders using Quaternary Signed Digit number system | |
da Rosa et al. | The Radix-2 m Squared Multiplier | |
CN115587274A (en) | Polynomial multiplication accelerating method and device | |
Liu et al. | A high speed VLSI implementation of 256-bit scalar point multiplier for ECC over GF (p) | |
Laxman et al. | FPGA implementation of different multiplier architectures | |
Parhami | On equivalences and fair comparisons among residue number systems with special moduli | |
JP3660075B2 (en) | Dividing device | |
Tiwari et al. | Implementation of high speed and low power novel radix 2 booth multiplier using 2248 BEC converter | |
Jagadeeshkumar et al. | A novel design of low power and high speed hybrid multiplier | |
Ferrer et al. | A fast finite field multiplier | |
Bankar et al. | Design of arithmetic circuit using Quaternary Signed Digit Number system | |
Madhuri et al. | Analysis of reconfigurable multipliers for integer and Galois field multiplication based on high speed adders | |
WO2024109730A1 (en) | Variable modular multiplier, operation method, and related device | |
TWI802095B (en) | Modular multiplication circuit and corresponding modular multiplication method | |
Haripriya et al. | Design and Analysis of 16-bit Vedic Multiplier using RCA and CSLA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |