CN115587274A

CN115587274A - Polynomial multiplication accelerating method and device

Info

Publication number: CN115587274A
Application number: CN202211245657.5A
Authority: CN
Inventors: 王中风; 张灏辰; 田静
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-01-10

Abstract

The invention provides a method and a device for accelerating polynomial multiplication, wherein the device comprises m preprocessing external chunks, an input sorting module, k preprocessing internal chunks, a group of central multiplier arrays, k post-processing internal chunks, an output integration module and m post-processing external chunks

Description

Polynomial multiplication accelerating method and device

Technical Field

The invention relates to a method and a device for accelerating polynomial multiplication.

Background

In the fields of digital signal processing, cryptography, coding theory and the like, the problem of how to quickly perform multiplication operation on two polynomials is often encountered, and the cycle number, total delay and resource consumption of the polynomials are important factors for determining the overall hardware architecture surface-to-efficiency ratio in the application scene, so that people put forward a plurality of achievable optimization methods for the polynomials.

The Karatsuba algorithm since 1962(reference: karatsuba, anatolii)&The Multiplication of polynomial Numbers on automation, soviet Physics Doklady.7.595) was proposed as one of the best ways to reduce the complexity of polynomial Multiplication over several decades. It can make the multiplication complexity in N-term polynomial multiplication be reduced by

Down to

Addition complexity of not more than

However, in practical applications, polynomial multiplication operations with large polynomial coefficients bit width are sometimes encountered, for example, in the study of elliptic curves, such a problem may be encountered in the modular multiplication operations in the galois field, and usually, a conventional multiplier is used as a central multiplier, or a multiplier ip provided in the FPGA is used as a central multiplier. However, when the bit width of the polynomial coefficient reaches tens of bits or hundreds of bits, the functional range of the multiplier ip may be exceeded, and the conventional multiplier design may cause problems of too high operation complexity, too large hardware area, and the like, so that the polynomial multiplier in this case may adversely affect the performance of the whole hardware implementation.

There are many implementations of polynomial multiplication and integer multiplication based on the kartsuba algorithm. For two binomial polynomials a (x) = a ₀ +a ₁ x and B (x) = B ₀ +b ₁ x, the classical multiplication algorithm is:

C(x)＝a ₀ b ₀ +(a ₀ b ₁ +a ₀ b ₁ )x+a ₁ b ₁ x ²

the algorithm requires four multiplications and one addition. And a binomial polynomial multiplication algorithm KA based on the Karatsuba algorithm ₂ Comprises the following steps:

C(x)＝a ₀ b ₀ +((a ₀ +a ₁ )(b ₀ +b ₁ )-a ₀ b ₀ -a ₁ b ₁ )x+a ₁ b ₁ x ²

the algorithm requires three multiplications and four additions. On the premise that the delay and resource consumption of multiplication are far higher than those of addition operation, the complexity of the binomial multiplication is reduced to a certain extent by the algorithm. Based on Karatsuba binomial multiplication, a recursive term of 2 can be obtained ⁿ The Karatsuba algorithm of (1), which can be used for two 2 s ⁿ The polynomial is used for fast multiplication, and the specific algorithm is shown as algorithm I, wherein

The first algorithm is as follows: recursive Karatsuba2 ⁿ Polynomial multiplication algorithm

Are unsigned integers (including 0).

Calculated, the multiplication complexity of the algorithm is 4 of that of the traditional algorithm ⁿ Is reduced to 3 ⁿ The addition complexity is not more than 2.3 ⁿ ⁺¹ -2 ⁿ⁺³ +2. Except for 2 ⁿ Besides the Karatsuba polynomial multiplication of terms, there are also the Karatsuba algorithms of terms 3, 5, 7, and then the Karatsuba polynomial multiplication of arbitrary integer terms is also formed by using a method similar to the recursive algorithm described above. It is also demonstrated in the references "Weimers kirch, andre and Christof Paar." genetics of the Karatsuba Algorithm for influence implementations. "IACR Cryptol. EPrint Arch.2006 (2006): 224" that for any positive integer N, the ratio of the hardware area of Karatsuba polynomial multiplication to that of conventional polynomial multiplication is not less than

Disclosure of Invention

The invention aims to: the technical problem to be solved by the present invention is to provide a method and an apparatus for accelerating polynomial multiplication, and particularly to a method and an apparatus for accelerating polynomial multiplication based on Karatsuba architecture, wherein the method comprises:

two sets of polynomial coefficients are input, and the number of each set of polynomial coefficients is

Wherein p is ₁ 、p ₂ 、……、p _m The number of prime factors is 1, 2, \8230, 8230and m repeatable prime factors;

two sets of polynomial coefficients are expressed as terms

The Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of externally preprocessed data;

performing position taking sorting and reordering on the two groups of externally preprocessed data respectively to obtain sorted data;

the sorted data is counted according to items

The Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is _-1 、p _-2 、……、p _-k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements; the usage requirement is determined according to the area (resource) size of a multiplier which can be accepted by a user, for example, to calculate 4-term 64-bit polynomial multiplication, 256 DSPs are used for a traditional multiplier, 144 DSPs are used for a traditional karatsua, and 108 DSPs are used if p-1 is set to be =2, and 81 DSPs are used if p-1 is set to be = p-2= 2;

performing multiplication operation on corresponding data in the two groups of internally preprocessed data to obtain a group of preliminary product data;

the preliminary product data is counted as items

The Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream, and internal post-processing data is obtained;

reordering, shifting and adding the internal post-processed data to obtain integrated data;

the integrated data is counted into items

The Karatsuba algorithm performs operation processing on all operation rules after multiplication operation is performed on the data stream to obtain final output data, namely, the term number is

The polynomial coefficient of the product of.

The invention also provides an accelerating device for polynomial multiplication, which comprises m preprocessing external chunks, an input sorting module, k preprocessing internal chunks, a group of central multiplier arrays, k post-processing internal chunks, an output integration module and m post-processing external chunks, wherein m and k are positive integers.

The m preprocessing external chunks are used for inputting two groups of polynomial coefficients, and the number of terms of each group of polynomial coefficients is

Wherein p is ₁ 、p ₂ 、……、p _m The 1 st, 2 nd, 8230, m repeatable prime factors of the number of the items; then two groups of polynomial coefficients are calculated according to terms of

The Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of external preprocessed data;

the input sorting module is used for respectively carrying out position taking sorting and reordering on the two groups of externally preprocessed data to obtain sorted data;

the k preprocessing internal chunks are used for sorting the data according to the number of items

The Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is _-1 、p _-2 、……、p _-k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements;

the group of central multiplier arrays are used for multiplying corresponding data in the two groups of internal preprocessed data to obtain a group of preliminary product data;

the k post-processing internal chunks are used for setting the preliminary product data into the number of terms

the output integration module is used for performing reordering, shift operation and addition operation on the internally post-processed data to obtain integrated data;

the m post-processing external chunks are used for integrating the data according to the number of items

The Karatsuba algorithm performs operation processing on all operation rules after the multiplication operation of the data stream is completed to obtain final output data, namely the term number is

The polynomial coefficient of the product of.

The m preprocessed external chunks are respectively

The KA _ pre module of (1);

the k preprocessed internal blocks are respectively

The KA _ pre module of (1);

the k post-processing internal chunks are respectively

The KA _ post module of (1);

the m post-processing external chunks are respectively

The KA _ post module of (1);

wherein

Respectively representing the number of items as p ₁ 、p ₂ 、……、p _m The Karatsuba algorithm module of (1); references "Weimers kirch, andr and Christof paar." genetics of the Karatsuba Algorithm for efficacy Immunitions, "IACR Cryptol. EPrint Arch.2006 (2006): 224" and "Montgomery, peter L." Five, six, and seven-term Karatsuba-like for project. "IEEE Transactions on Computers 54 (2005): 362-369";

wherein

Respectively represent the k prime factors p based on the 1 st, 2 nd, 8230; \8230;, k prime factors p specified according to the use requirement _-1 、p _-2 、……、p _-k Corresponding number of items is p _-1 、p _-2 、……、p _-k The Karatsuba algorithm module of (1); references "Weimers kirch, andre and Christof Paar." genetics of the Karatsuba Algorithm for efficiency innovations. "IACR Cryptol. EPrint Arch.2006 (2006): 224" and "Montgomery, peter L." Five, six, and seven-term Karatsuba like for purposes of "IEEE Transactions on Computers 54 (2005): 362-369";

wherein the KA _ pre block represents a hardware device that performs all operations that the data stream undergoes before going from the input to all multiplication operations in the Karatsuba algorithm;

where the KA _ post block represents the hardware device that performs all operations that the data stream undergoes after all multiplication operations to the output in the kartsuba algorithm.

The central multiplier array comprises a plurality of integer multipliers, wherein the number of the multipliers is equal to that of the central multiplier array

And

is determined by the structure of

And

the corresponding central multipliers are respectively l ₁ ,l ₂ ,...,l _m And l _-1 ,l _-2 ,...,l _-k Then, the number of central multipliers is

And (4) respectively.

The input sorting module is configured to perform an input sorting algorithm as follows:

wherein a _ i ₀ ,a_i ₁ ,...,

The first number representing the input sorting module is

The input data of (1) and (2) in (8230) \ 8230; and (8230); and,

Input binary integer data, b _ i ₀ ,b_i ₁ ,...,

Two sets of numbers representing input sorting modules are

The input data of (1), 2, 8230; a,

Input binary integer data;

a_o ₀₀ ,a_o ₀₁ ,...,

1 st, 2 nd, 8230in a first subgroup in a first set of output data representing input sort modules 8230,

Binary integer data, a _ o ₁₀ ,a_o ₁₁ ,...,

1 st, 2 nd, 8230in a second subgroup in a first set of output data representing input sort modules 8230,

Number of binary integersAccording to the formula of \8230;,

representing the first of the output data input to the sorting module

1, 2, \ 8230; \ 8230;, in the respective subgroup,

Binary integer data;

b_o ₀₀ ,b_o ₀₁ ,...,

1 st, 2 nd, 8230in a first subgroup in a second set of output data representing input sort modules 8230,

A binary integer data, b _ o ₁₀ ,b_o ₁₁ ,...,

1 st, 2 nd, 8230in a second subgroup in a second set of output data representing input sort modules 8230,

A binary integer data of \8230;,

second of the second set of output data representing input sorting modules

1, 2, \ 8230; \ 8230;, in the respective subgroup,

Binary integer data.

The output integration module is used for executing the following output integration algorithm:

wherein c _ i ₀₀ ,c_i ₀₁ ,…,

The first input data group of the output integration module has 1 st, 2 nd, 8230, 8230,

binary integer data, c _ i ₁₀ ,c_i ₁₁ ,…,

The 1 st, 2 nd, 8230th, and the like in the second group of input data of the output integration module are shown,

Binary integer data, \ 8230 \ 8230;,

to represent output integration Module

Group input data 1, 2, 8230, 8230,

Binary integer data;

wherein c _ o ₀ ,c_o ₁ ,…,

1 st, in the output data representing the output integration module 2, 823060, 8230,

Binary positive integer data.

The input sorting module comprises a sorting module and an input reordering module;

the sorting module will be two groups

The low to high 0 th to t-1 th bits, t to 2t-1 th bits, \ 8230 \ 8230;, the second bit of each number in the binary integer data

To the first

The bits are respectively taken out and combined into a new integer, wherein t is an integer set according to the use requirement and is obtained from each initial data

New integers are divided into a group to form

A new array;

the input reordering module is to

A new array of middle front

All the 1 st, 2 nd, 8230of the array, 8230,

Taking out data and splicing to new No. 1, 2, \8230;, B,

Number of data is

And will be

After a new array

All the 1 st, 2 nd, 8230of the array, 8230,

Data fetch and splice

Number of data is

An array of (2).

The output integration module comprises an output reordering module, a shift module array and an addition array;

the output reordering module is to

Number of data is

The 1 st, 2 nd, 8230of the above-mentioned groups, 8230,

Taking out the data and splicing them into new 1 st, 2 nd, 8230, 8230,

Number of data is

The array of (2);

the shift module array is used for reordering the 1 st, 2 nd, 8230, the,

Each data is padded with zero at high order and then left shifted by 0, t, \ 8230; \8230;, n, n in binary by shift register,

The bit gets new data;

the addition array shifts all of the data in each array

Adding the data by an adder to obtain a sum, wherein all the arrays are obtained together

And a step of summing the sums, and outputting the resultant sum as output data of the addition array.

The invention adds a group of input sorting modules and output integration modules in the Karatsuba polynomial multiplication architecture, so that the Karatsuba polynomial multiplication architecture can be extended inwards and outwards in a bidirectional way, and provides a low-complexity low-resource high-bit-width polynomial multiplication method and device based on the Karatsuba architecture. The part outside the input sorting module and the output integration module comprises a Karatsuba preprocessing external block and a Karatsuba post-processing external block which are used for realizing the functions to be realized by the polynomial multiplication operation. The part between the input sorting module and the output integration module comprises a Karatsuba preprocessing internal block, a central multiplier array and a Karatsuba post-processing internal block, the original structure of the Karatsuba is longitudinally extended internally, and the Karatsuba post-processing internal block is further optimized on the basis of realizing functions.

Furthermore, the invention also provides a key exchange acceleration method, and polynomial multiplication operations in the CSIDH key exchange process are all realized by the acceleration method of polynomial multiplication, wherein the number of multipliers is N, and N is the term number of the polynomial involved in the CSIDH key exchange process.

Correspondingly, the invention also provides a key exchange accelerating device, which comprises the accelerating device for polynomial multiplication.

Has the advantages that: the method and the device of the invention realize the further simplification of the high-bit-width polynomial multiplier, so that the N-term polynomial is multipliedThe multiplication complexity of the method operation is further reduced, and the ratio of the hardware area to the traditional polynomial multiplication algorithm can be smaller than

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a schematic diagram of a hardware architecture for Karatsuba polynomial multiplication.

FIG. 2 is a schematic diagram of the low complexity, low resource, high bit width polynomial Karatsuba multiplication architecture of the present invention.

Fig. 3 is a schematic circuit diagram of an input sorting module.

Fig. 4 is a circuit diagram of an output integration module.

Detailed Description

The invention provides a method and a device for accelerating polynomial multiplication, in particular to a method and a device for accelerating polynomial multiplication based on a Karatsuba architecture, wherein the method comprises the following steps:

two sets of polynomial coefficients are expressed as terms

the sorted data is counted according to items

carrying out multiplication operation on corresponding data in the two groups of internal preprocessed data to obtain a group of preliminary product data;

the preliminary product data is counted as items

the integrated data is counted into items

The polynomial coefficient of the product of.

the k preprocessing internal chunks are used for sorting the sorted data according to the number of items

The Karatsuba algorithm performs operation processing on all operation rules before the data stream reaches multiplication operation to obtain two groups of internal preprocessed data, wherein p is _-1 、p _-2 、……、p _-k Respectively 1, 2, 8230, k prime factors designated according to use requirements;

the group of central multiplier arrays are used for multiplying corresponding data in the two groups of internally preprocessed data to obtain a group of preliminary product data;

the output integration module is used for carrying out reordering, shift operation and addition operation on the data subjected to internal post-processing to obtain integrated data;

The polynomial coefficient of the product of.

The invention is based on

The Karatsuba structure of the terms is designed, wherein m is the order of the Karatsuba external structure, namely the term number of the integral structure input polynomial, k is the order of the Karatsuba internal structure, and p is _i The minus sign in the subscript is used to distinguish it from other subscripts, and is also used to indicate that their corresponding KA _ pre and KA _ post functions are used in the intra-Karatsuba architecture. By being at

A group of input sorting modules and output integration modules are added in the Karatsuba polynomial multiplication structure, so that the structure can be extended inwards and outwards in a bidirectional mode, and the modified Karatsuba polynomial multiplication structure with low complexity, low resources and high bit width is formed as shown in FIG. 2.

It can be seen that the overall architecture of fig. 2 is similar to that of fig. 1, but with some differences in detail. The two red dotted lines in fig. 2 are the input sorting module and the output integration module designed by the present invention, respectively. The blue modules except the red lines represent the external blocks in the framework, and the red lines are sequentially

The KA _ pre module and the KA _ post module; the yellow module within the two red lines represents the internal block in the structure, and the yellow module is arranged from the red line to the inside in sequence

The KA _ pre module and the KA _ post module. The polynomial multiplication operation realized in the external architecture is the function realized by the whole architecture, and the internal architectureThen the original Karatsuba framework is longitudinally extended, and deeper optimization is performed on the basis of the external framework. In the middle of the array is a row of central multiplier arrays, the number of the multipliers is

And

if the number of their corresponding central multipliers is l respectively ₁ ,l ₂ ,...,l _m And l _-1 ,l _-2 ,...,l _-k Then the number of central multipliers in fig. 2 is

And (4) respectively. KA. The subscripts of KA _ pre and KA _ post represent the number of terms of this layer of kartsuba polynomial multiplication architecture.

The input sorting algorithm and the output integration algorithm are shown as algorithm two and algorithm three, and the input sorting module circuit schematic diagram and the output integration module circuit schematic diagram are shown as fig. 3 and 4. A new parameter t exists in the second algorithm and the third algorithm, and the requirement is met

And is minimized as much as possible.

And (3) algorithm II: inputting a sorting algorithm:

and (3) algorithm III: and (3) outputting an integration algorithm:

represents an integer

From jt-1 bit to (j-1) t bit of a slice in binary representation, the subscripts for numbers a _ i and b _ i have only one number, and the subscripts for a _ o and b _ o have two numbers, all for distinction only)

The subscript of the coefficient c _ o has only one number, and the subscript of c _ i has two numbers, both for distinction only. The subscripts for numbers a _ i and b _ i have only one number, and the subscripts for a _ o and b _ o have two numbers, all for distinction only.

Algorithm two and fig. 3 show an input sorting module comprising a set of functional blocks for bit-wise truncation of input data and a set of circuits for re-ordering and combining the output data sequence. Algorithm three and fig. 4 show that the output integration module includes a set of circuits for rearranging and combining the input data sequence, some shift module arrays, and a set of addition arrays (a row of trapezoidal block arrays in fig. 4). The input sorting module and the output integration module play two roles in the circuit: one is to perform conversion of the length of the coefficient vector by

The length of the input-output vector of the central multiplier of the structure of the term Karatsuba becomes

Length of data vector transmitted between the mth layer pre-or post-treatment and the (m + 1) th layer pre-or post-treatment from outside to inside in the Karatsuba architecture; and secondly, the bit width of each numerical value in the transmission process is reduced, the number of terms is increased, the Karatsuba architecture can be conveniently extended in a bidirectional mode, and the architecture is further optimized.

In a 4-term (N =4, then according to

Taking m =2,p ₁ ＝p ₂ ＝...＝p _m = 2) polynomial multiplication unit, for example, the polynomial coefficient width is set to 64. Then a multiplier unit operated by conventional polynomial multiplication, a multiplier unit operated by conventional Karatsuba polynomial multiplication, and a low-complexity, low-resource, high-bit-width polynomial multiplication unit (k is 2,t is 16,p) based on the Karatsuba architecture designed in the present scheme _-1 ＝p _-2 ＝...＝p _-k = 2) the resource/area ratio of the three in the FPGA is shown in table 1.

TABLE 1

In the embodiment, an EDA (electronic design automation) platform for simulation, integration and realization is vivado2021.1, and the selected FPGA model is Xilinx Virtex-7xc7vx690tffg1157-3. In the above data, # Slices and # DSP are both data obtained directly after synthesis and implementation, # SEC is data obtained by calculation that can represent hardware resource consumption or area, and the calculation formula is:

#SEC＝#BRAMs×100+#DSPs×100+#Slices

where # BRAMs defaults to 0 since no BRAM is used in any of the three multipliers. Theoretically, the minimum limit of the ratio of the hardware area of the Karatsuba polynomial multiplication to the conventional polynomial multiplication algorithm is

In the above example this limit value is

It can be seen from table 1 that the conventional Karatsuba method is slightly above this limit, whereas the present solution is below this limit.

The embodiment also provides a CSIDH key exchange acceleration method, which includes: the polynomial multiplication operation in the CSIDH key exchange process is realized by the polynomial multiplication acceleration method.

Further, the number of multipliers is N, where N is the number of terms of the polynomial involved in the CSIDH key exchange process, and in an operation environment of 64-bit integers, N is 8 in the CSIDH key exchange process using the CSIDH512 parameter set, N is 16 in the CSIDH key exchange process using the CSIDH1024 parameter set, and N is 32 in the CSIDH key exchange process using the CSIDH2048 parameter set.

The CSIDH key exchange process can involve polynomial multiplication operation of multiple degrees, the polynomial multiplication operation of each degree is the same, and the number N of the multipliers is the number of terms of different polynomials corresponding to the CSIDH key exchange process with different parameters.

Correspondingly, the embodiment of the invention also provides a CSIDH encryption and decryption acceleration device, which comprises the acceleration device for polynomial multiplication.

The CSIDH key exchange acceleration method and apparatus provided in this embodiment can improve the efficiency of the CSIDH key exchange process on the basis of reducing the resource consumption of the FPGA hardware implementation of the CSIDH.

In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the method for accelerating polynomial multiplication and some or all of the steps in each embodiment provided in the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.

It is clear to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a computer program or a software product, where the computer program or the software product may be stored in a storage medium and include instructions for enabling a device (which may be a personal computer, a server, a single chip microcomputer, an MUU, or a network device) including a data processing unit to execute the method according to the embodiments or some parts of the embodiments of the present invention.

The present invention provides a method and an apparatus for accelerating polynomial multiplication, and a plurality of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be considered as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A method for accelerating polynomial multiplication, comprising:

two sets of polynomial coefficients are input, the number of terms in each set of polynomial coefficients is

Wherein p is ₁ 、p ₂ 、……、p _m The 1 st, 2 nd, 8230, m repeatable prime factors of the number of the items;

two sets of polynomial coefficients are calculated according to terms

the sorted data is counted according to items

The Karatsuba algorithm is as followsAll operation rules of the data stream before reaching the multiplication operation are operated to obtain two groups of internal preprocessed data, wherein p _-1 、p _-2 、……、p _-k Respectively 1 st, 2 nd, 8230, k prime factors designated according to use requirements;

the preliminary product data is counted as items

the integrated data is counted into items

The polynomial coefficient of the product of.

2. An accelerating device for polynomial multiplication is characterized by comprising m preprocessing external chunks, an input sorting module, k preprocessing internal chunks, a group of central multiplier arrays, k post-processing internal chunks, an output integration module and m post-processing external chunks, wherein m and k are positive integers;

Wherein p is ₁ 、p ₂ 、……、p _m The number of prime factors is 1, 2, \8230, 8230and m repeatable prime factors; then two groups of polynomial coefficients are calculated according to terms of

the k post-processing internal chunks are used for generating preliminary product data according to the number of items

The polynomial coefficient of the product of (c).

3. The apparatus of claim 2, wherein the m preprocessed external chunks are each

The KA _ pre module;

the k preprocessed internal chunks are respectively

The KA _ pre module;

the k post-processing internal chunks are respectively

The KA _ post module of (1);

the m post-processing external chunks are respectively

The KA _ post module of (1);

wherein

Respectively representing the number of items as p ₁ 、p ₂ 、……、p _m The Karatsuba algorithm module of (1);

wherein

Respectively representing the prime factors p of the k prime factors based on the 1 st, 2 nd, 8230, and p _-1 、p _-2 、……、p _-k Corresponding number of items is p _-1 、p _-2 、……、p _-k The Karatsuba algorithm module of (1);

4. The apparatus of claim 3 wherein the central multiplier array comprises a plurality of integer multipliers, wherein the number of multipliers is selected from the group consisting of

And

is determined by the structure of

And

the corresponding central multipliers are respectively l ₁ ,l ₂ ,...,l _m And l _-1 ,l _-2 ,...,l _-k Then the number of central multipliers is

And (4) respectively.

5. The apparatus of claim 4, wherein the input sorting module is configured to execute an input sorting algorithm that:

inputting:

……

……

and (3) outputting:

wherein

A first group of numbers representing input sorting modules is

The input data of (1) and (2) in (8230) \ 8230; and (8230); and,

The input binary integer data is inputted to the input,

two sets of numbers representing input sorting modules are

The first group of input data includes 1 st, 2 nd, 8230, 8230,

Input binary integer data;

A number of binary integer data of the number of binary integers,

1 st, 2 nd, \8230; a,

Binary integer data, \ 8230 \ 8230;,

representing the first of the output data input to the sorting module

1, 2, \ 8230; \ 8230;, in the respective subgroup,

Binary integer data;

A number of binary integer data of the number of binary integer data,

1 st, 2, \8230; a,

A binary integer data of \8230;,

second of a second set of output data representing input sorting modules

1 st part of the subgroup 2, 823060, 8230,

Binary integer data.

6. The apparatus of claim 5, wherein the output integration module is configured to perform an output integration algorithm that:

inputting:

……

and (3) outputting:

wherein

a number of binary integer data of the number of binary integer data,

1 st, 2 nd, 8230, (8230) in the second set of input data representing the output integration module,

A binary integer data of \8230;,

to represent output integration Module

Group input data 1, 2, \8230, 8230, 8230,

binary integer data;

wherein

The output data of the output integration module includes 1 st, 2 nd, 8230, 8230,

binary positive integer data.

7. The apparatus of claim 6, wherein the input sorting module comprises a sorting module and an input reordering module;

the sorting module will be two groups

To the first

The bits are respectively taken out and combined into a new integer, wherein t is an integer set according to the use requirement, and the new integer is obtained from each initial data

New integers are divided into a group to form

A new array;

the input reordering module is to

A new array of middle front

All the 1 st, 2 nd, 8230of the array, 8230,

Taking out the data and splicing them into new 1 st, 2 nd, 8230, 8230,

Number of data is

And will be

After in a new array

All numbers 1, 2, \ 8230of the individual arrays…、

Data fetch and splice

Number of data is

An array of (2).

8. The apparatus of claim 7, wherein the output integration module comprises an output reordering module, a shift module array, an addition array;

the output reordering module is to

Number of data is

The 1 st, 2 nd, 8230of the above-mentioned groups, 8230,

Taking out data and splicing to new No. 1, 2, \8230;, B,

Number of data is

An array of (2).

9. The apparatus of claim 8, wherein the shift module array is to reorder the 1 st, 2 nd, 8230; the shift module array is to reorder the data in each array,

Each data is respectively inHigh-order zero-filling, left-shifting 0, t, 8230, and,

The bit gets the new data.

10. The apparatus of claim 9, wherein the addition array shifts all of the data in each array

Adding data to obtain a sum, and summing all the arrays

And providing the sum as output data of the addition array.