CN111694540A

CN111694540A - Base 64 arithmetic circuit for number theory conversion multiplication

Info

Publication number: CN111694540A
Application number: CN202010371311.4A
Authority: CN
Inventors: 华斯亮; 卞九辉; 张静亚; 张慧国; 刘玉申; 徐健
Original assignee: Changshu Institute of Technology
Current assignee: Changshu Institute of Technology
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2020-09-22
Anticipated expiration: 2040-05-06
Also published as: CN111694540B

Abstract

The invention discloses a base 64 arithmetic circuit for number theory conversion multiplication, which comprises 64 operand generating modules, wherein each of 64 input data is subjected to high-order zero filling and then is divided into 22 words by taking 3 bits as one word, 1 path of 64 96-bit operands, 48 paths of 22-bit operands, 3 paths of 32-bit operands and 12 paths of 24-bit operands are combined and output, each operand generating module is connected with an operation digital-analog adding module, and the operands output by each operand generating module are subjected to modular addition; a module for modulo p, which outputs the data from the module for adding each operation number modulo the prime number p, where the prime number p is 2⁶⁴‑2³²+1. The invention combines the operands from 4096 to 1504 in the prior art, greatly reduces the calculation overhead and improves the calculation efficiency of the base 64 operation.

Description

Base 64 arithmetic circuit for number theory conversion multiplication

Technical Field

The present invention relates to an arithmetic circuit, and more particularly, to a radix-64 arithmetic circuit for number-theoretic transform multiplication.

Background

Large integer multiplication, in addition to conventional long multiplication, also involves

And (4) an algorithm.

The core idea of the algorithm is as follows: FFT on a primary ring is respectively carried out on two large integers with the length of n, and the two large integers are converted into frequency domain distribution; performing dot multiplication on the frequency domain distribution of the two integers to obtain the frequency domain distribution of the product; the frequency domain distribution of the product is subjected to IFFT in a loop, and the product is obtained. Using a number-theoretic transform instead of a discrete fourier transform, the rounding error problem can be avoided by using modular arithmetic instead of floating point arithmetic. Number theory transform multiplication specially

Multiplication using a number theory transformation is used in the algorithm. The number theory transformation and the inverse number theory transformation are used as operation cores in the number theory transformation multiplication, occupy more than 90% of operation amount and operation time in the NTT multiplication, optimize the speed, the area and the power consumption of the number theory transformation, and have critical influence on the overall performance of the NTT multiplication.

A 16777216 point number theoretic transform can be decomposed into 4-level base 64 arithmetic units and twiddle factor multiplication operations. The calculation of the twiddle factor can be calculated in advance and stored in a ROM, and the twiddle factor can be directly read when in use. The calculation amount of the base 64 operation accounts for more than 90% of the logarithm conversion, and the optimization of the base 64 operation is of great importance to the efficiency of the logarithm conversion.

Design and implementation of a large integer multiplier FPGA, thank you star and the like, electronic and information science and newspaper, 2019. The paper describes a paper based on

Large integer multiplier hardware architecture for the algorithm. Article 65536 pointAnd (3) performing number theory transformation, namely decomposing the number theory transformation into a form of 64 points and 1024 points, wherein the 1024 point number theory transformation uses a structure constructed by 2-level base 32 operation in series. The base 64 operation comprises 64 shift units and a tree-shaped large sum processing unit. The paper uses "0" padding, so that each tree-shaped large sum processing unit needs to process 64 192 bits of data, and the whole base 64 operation needs to process 64 × 64 — 4096 operands. The efficiency of the basic 64 operation circuit is not high enough, so that the power consumption and the resource needed after the circuit is realized are large.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a base 64 arithmetic circuit for number theory conversion multiplication, which solves the problems of high power consumption and resource overhead of the base 64 arithmetic circuit.

The technical scheme of the invention is as follows: a radix-64 operation circuit for number-theoretic transform multiplication, comprising:

the operand generating module is provided with 64 operand generating modules, the number of the 64 operand generating modules is Xk, k is 0,1,2, … and 63, each operand generating module comprises a dividing circuit, a merging circuit and a zero padding circuit, the dividing circuit divides each of 64 input data into 22 words by taking 3 bits as one word after carrying out high-order zero padding on the input data, and the divided input data is x_n,m,0≤n<64,0≤m<22, said merge circuit forming said input data divided into 64 × 22 words into operand outputs, 1 output of said dividing circuits of 64 said output operand generation modules being 64 96-bit operands, 48 outputs being 22 192-bit operands, 3 outputs being 32 192-bit operands, and 12 outputs being 24 192-bit operand outputs, said zero-padding circuit padding empty bits when said merge circuit outputs operands to "0";

the operation digital-analog addition module is used for performing modular addition on the operand output by each operand generation module;

and the number of the first and second groups,

the module of modulus p realizes that the data output by each operation modulus addition module is output after modulus of prime number p, and the prime number p is 2⁶⁴-2³²+1。

Further, the operand generation module whose output is 64 96-bit operands is numbered X0, the last 22 words of each 96-bit operand are the input data, and the first 10 words are assigned zeros.

Further, the operand generation module with 22 output 192-bit operands is numbered Xk, k is odd, and each operand OP_mFrom 32 different input data x_n,m,0≤n<64, using the same word index m,0 ≦ m<22 are combined to form_n,mAt the lowest position of OP_mThe position (D) is calculated from 3 × (m + nk) (mod 192).

Further, the operand generation module outputting the 32 192-bit operands is numbered X16, X32, and X48, the 32 operands are divided into 16 groups, each group of 2 operands, one group of OP0 and OP1, one group of OP2 and OP3, and so on, and the operands OP in each group_2jAnd OP_2j+1Consisting of 88 different input data x_n,m,4j≤n≤4j+3,0≤m<22 are combined to form_n,mAt the lowest position of OP_2jAnd OP_2j+1The position of (2) is calculated from 3 × (m + nk) (mod 192), x_n,mIs preferentially placed on OP_2jIn, e.g. OP_2jIf the position is already occupied, then put on OP_2j+1To the corresponding position in (a).

Further, the operand generation module outputting 24 192-bit operands is numbered Xk except X0, X16, X32 and X48, k is an even number divisible by 4 or 8, 24 operands are divided into 4 groups, OP0 to OP5 are one group, OP6 to OP11 are one group, and so on, and the operands OP in each group are OP_6jTo OP_6j+5Composed of 352 different input data x_n,m,16j≤n≤16j+15,0≤m<22 are combined to form_n,mAt the lowest position of OP_6jTo OP_6j+5The position of (2) is calculated from 3 × (m + nk) (mod 192), x_n,m4 or 8 words are used as the period merging operand and are preferentially placed in OP_6jTo OP_6j+5The middle index is the smaller OP.

Further, the operand generation module with 22-bit operand outputs is numbered Xk except X0, X16, X32 and X48, k is not integer 4 or 8Dividing the even number, dividing the 22 operands into 2 groups, one group being OP 0-OP 10, one group being OP 11-OP 21, the operands OP in each group being OP_11jTo OP_11j+10Composed of 704 different input data x_n,m,32j≤n≤32j+31,0≤m<22 are combined to form_n,mAt the lowest position of OP_11jTo OP_11j+10The position of (2) is calculated from 3 × (m + nk) (mod 192), x_n,mUsing 2 words as the period to merge operands and preferentially placing them in OP_11jTo OP_11j+10The middle index is the smaller OP.

The technical scheme provided by the invention has the advantages that:

by using the null bit of 'zero padding' after operand shift, merging operands of the radix 64 operation in the number theory transformation multiplication, merging the operands from 4096 to 1504 in the prior art, greatly reducing the calculation overhead and improving the calculation efficiency of the radix 64 operation.

Drawings

FIG. 1 is a schematic diagram of the general structure of the radix 64 operation circuit for number theory transform multiplication according to the present invention.

Fig. 2 is a schematic diagram of a zero-padding partitioning method for input data by a partitioning circuit in an operand generation module.

FIG. 3 is a diagram of a partitioning circuit in an operand generation module.

Fig. 4 is a schematic diagram of output data obtained by the merging circuit of the X0 operand generation module.

FIG. 5 is a schematic diagram of a merge circuit of the X0 operand generation module.

FIG. 6 shows merged operands of the merge circuit of the X1 operand generation module.

FIG. 7 shows a merging circuit of operand number 0 OP0 in the X1 operand generation block.

FIG. 8 shows merged operands of the merge circuit of the X16 operand generation module.

FIG. 9 shows merged operands of the merge circuit of the X4 operand generation module.

FIG. 10 shows merged operands of the merge circuit of the X2 operand generation module.

Fig. 11 is a circuit schematic diagram of a 64 operation digital-to-analog addition module.

Fig. 12 is a circuit schematic diagram of 22 operation digital-to-analog addition module.

Fig. 13 is a circuit schematic diagram of a 32-operation digital-to-analog addition module.

Fig. 14 is a circuit schematic diagram of the 24 operation digital-to-analog addition module.

Detailed Description

The present invention is further described in the following examples, which are intended to be illustrative only and not to be limiting as to the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which would occur to persons skilled in the art upon reading the present specification and which are intended to be within the scope of the present invention as defined in the appended claims.

The formula for the base 64 operation is as follows

Wherein k is not less than 0<64, p is a prime number, W₆₄Is the 64 th unit root.

When prime number p is Solinas prime number, p is 2⁶⁴-2³²+1. This prime number supports efficient modulo operation: 2¹⁹²modp＝1，2⁹⁶modp＝-1，2⁶⁴modp＝2³²-1. Unit root W calculated by using the prime number₆₄＝2³The characteristic of power of 2, the multiplication and addition operation can be conveniently converted into the shift and the modular addition operation, and the calculation complexity of the number theory conversion is reduced. Thus, the base 64 operation can be written as

Each x is_n3 bits are taken as a basic unit and are divided into 22 words, which are called x_n,m，0≤m<22。x_nCan be expressed as

Where m denotes the mth word, x_nHas a data width of 64 bits, x_n,mIs 3 bits, x_n,21Is 1 bit. After splitting the input data, the radix 64 operation can be written as follows, and shifted operands can be merged using "0 padding" to reduce the number of modulo added operands.

Please refer to fig. 1, the basic 64 operation circuit for number theory transform multiplication according to the present embodiment includes 64 operand generation modules from X0 to X63, an operation digital-analog addition module, and a modulo-p module, where the operation digital-analog addition module is divided into 64 operation digital-analog addition modules, 22 operation digital-analog addition modules, 32 operation digital-analog addition modules, and 24 operation digital-analog addition modules according to the number of input operands. 64-bit data input on the circuit structure are used as the input of each operand generation module, an operation digital-analog addition module is connected behind each operand generation module, and a modulo-p module is connected behind each operation digital-analog addition module.

The operand generation module comprises a dividing circuit, a merging circuit and a zero filling circuit, and sequentially divides, merges and fills zero into 64-bit data to form an operand. Referring to fig. 2 and 3, the dividing circuit divides each 64-bit input data x_nIs filled with 0 bits to form 66 bits of data, and then divided into 22 words, each word containing 3 bits, and the 22 nd word is 1 bit because the highest 2 bits are filled with 0 bits. The data segmentation can be easily implemented with existing hardware with little hardware overhead.

The operand generation modules are numbered with Xk, k being 0,1,2, …,15, the merging circuits in each operand generation module are different, but may be divided into 4 groups by type, with the circuits within each group being similar.

Group one: x0, 1 in total; and a second group: k such as X1, X3, X5 and the like is odd, and the number of k is 8; and (3) group III: 3 of X16, X32 and X48; group four: xk, k being an even number divisible by 4 or 8, except for X0, X16, X32 and X48, of which there are 12 in total X4, X8 and X12; group five: the numbers of Xk, k except X0, X16, X32 and X48 are even numbers which cannot be completely divided by 4 or 8, and 16 in total are X2, X6, X10 and the like.

The data merge operation for each group is explained in groups as follows:

group one, the merge circuit of the X0 operand generation module.

The operands are in fact aligned input data. In other words, each operand is derived from 22 consecutive words of the segmented circuit output data. The merging circuit outputs 16 96-bit operands, each new 96-bit operand consisting of 32 words, the last 22 words being the input data, and the first 10 words being assigned zeros. As shown in FIG. 4, operand # j OP_jHas 96 bits, is_nPut in the low 66 bits, the high 30 bits are filled with zero, and the merging circuit is shown in fig. 5.

And the group two is a merging circuit of odd operand generation modules such as X1, X3, X5 and the like.

For the merging circuit of the Xk operand generation module with odd k, the inputs are 64-bit input data and the outputs are 22 192-bit operands. Each operand OP_mComposed of 64 different data x_n,m,0≤n<64, using the same word index m,0 ≦ m<22 are combined together. x is the number of_n,mAt the lowest position of OP_mThe position of (D) is calculated by 3 × (m + nk) (mod 192). The following operand is output by taking X1 as an example:

the merge circuit of the X1 operand generation module merges the operands as shown in fig. 6. The merged operation has 22 operands, each operand is composed of 64 different data x_n,m,0≤n<64, using the same word index m,0 ≦ m<22 are combined together. x is the number of_0,0Is 3 × (0+0 × 1) (mod 192) ═ 0, x, in OP0_1,0Is 3 × (0+1 × 1) (mod 192) ═ 3, and x is in OP0_0,1Is 3 × (1+0 × 1) (mod 192) ═ 3, x, at the position of OP1_63,1The lowest bit of (2) is located at 3 × (1+63 × 1) (mod 192) ═ 0. the merging circuit of operand No. 0 OP0 in the X1 operand generation module is shown in fig. 7.

And the operands output by the merging circuits of the rest operand generation modules are analogized in turn.

Group three, the merge circuits of the X16, X32, and X48 operand generation modules.

The input is 64-bit input data and the output is 32 192-bit operands. The 32 operands are divided into 16 groups of 2 operands, one group being OP0 and OP1, one group being OP2 and OP3, and so on. Operands OP within each group_2jAnd OP_2j+1Consisting of 88 different data x_n,m,4j≤n≤4j+3,0≤m<22 are combined together. x is the number of_n,mAt the lowest position of OP_2jAnd OP_2j+1Is calculated by 33 × (m + nk) (mod 192). x_n,mIs preferentially placed on OP_2jIn, e.g. OP_2jIf the position is already occupied, then put on OP_2j+1To the corresponding position in (a). The remaining slots are all filled with "0". Taking the merged circuit output data of the X16 operand generation module as an example, as shown in fig. 8, there are 16 sets of operands, each set including 2 merged operands. Each new 192-bit operand consists of 64 words, which come from 4 different input data.

Group four, Xk, k except X0, X16, X32, and X48, are merging circuits of even operand generation blocks divisible by 4 or 8.

For a merging circuit of an Xk operand generation module where k is an even number other than 0, 16, 32, or 48 that can be divisionally divided by 4 or 8, the input is 64-bit input data and the output is 24 192-bit operands. The 24 operands are divided into 4 groups of 6 operands, one group being OP 0-OP 5, one group being OP 6-OP 11, and so on. Operands OP within each group_6jTo OP_6j+5Composed of 352 different data x_n,m,16j≤n≤16j+15,0≤m<22 are combined together. x is the number of_n,mAt the lowest position of OP_6jTo OP_6j+5Is calculated by 3 × (m + nk) (mod 192). x_n,m4 or 8 words are used as the period merging operand and are preferentially placed in OP_6jTo OP_6j+5The middle index is the smaller OP. The remaining slots are all filled with "0". Taking the merged circuit output data of the X4 operand generation module as an example, as shown in fig. 9, there are 4 sets of operands, each set including 6 merged operands. First, theOne group comprises OP0 to OP 5; the second group comprises OP6 to OP11 and so on. Each new 192-bit operand consists of 64 words, which come from 16 different input data, each providing 4 or 8 consecutive words.

Group five, Xk, k except X0, X16, X32, and X48, is the merge circuit of even operand generation blocks that are not divisible by 4 or 8.

For a merging circuit of an Xk operand generation module where k is an even number other than 0, 16, 32, or 48 that is not divisible by 4 or 8, the input is 64-bit input data and the output is 22 192-bit operands. The 22 operands are divided into 2 groups of 11 operands, one group being OP 0-OP 10 and one group being OP 11-OP 21. Operands OP within each group_11jTo OP_11j+10Composed of 704 different input data x_n,m,32j≤n≤32j+31,0≤m<22 are combined to form_n,mAt the lowest position of OP_11jTo OP_11j+10The position of (C) is calculated from 3 × (m + nk) (mod 192), c_n,mUsing 2 words as the period to merge operands and preferentially placing them in OP_11jTo OP_11j+10The middle index is the smaller OP. The remaining slots are all filled with "0". Taking the merged circuit output data of the X2 operand generation module as an example, as shown in fig. 10, there are 2 sets of operands, each set including 11 merged operands. The first group comprises OP0 to OP 10; the second group comprises OP11 to OP21 and so on. Each new 192-bit operand consists of 64 words, which come from 32 different input data, each providing 2 consecutive words.

The operand generation modules obtain different operand quantities according to the different groups of operand generation modules, and the operation digital-analog addition module comprises a 64 operation digital-analog addition module, a 22 operation digital-analog addition module, a 32 operation digital-analog addition module and a 24 operation digital-analog addition module.

The 64-operation digital-to-analog addition module is shown in fig. 11, wherein CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "< < 1" represents that the Carry end (Carry end) of the Carry save adder is shifted to the left by 1 bit. The 4i, i-1, 2, …,16 position operands are reserved in 64 operands, and the rest operands are input into the first layer CSA every three; the carry end of the first layer CSA is shifted to the left by 1 bit and the sum end thereof, and the operand with the position of 4i, i being 1,2, …,16 is input into the second layer CSA; the sum end of every two second-layer CSAs and the carry end of one second-layer CSA are shifted left by 1 bit and input into a third-layer CSA; the carry end of the third layer CSA is shifted to the left by 1 bit, the sum end of the third layer CSA and the carry end of the other second layer CSA in every two second layer CSAs are shifted to the left by 1 bit and input into the fourth layer CSA; the sum end of every two fourth-layer CSAs and the carry end of one fourth-layer CSA are shifted to the left by 1 bit and input into the fifth-layer CSA; the carry end of the fifth layer CSA is shifted to the left by 1 bit, the sum end of the fifth layer CSA and the carry end of another one of every two fourth layer CSAs are shifted to the left by 1 bit and input into the sixth layer CSA; shifting the sum end of every two sixth-layer CSAs and the carry end of one sixth-layer CSA to the left by 1 bit and inputting the sum end and the carry end into a seventh-layer CSA; the carry end of the CSA of the seventh layer is shifted to the left by 1 bit, the sum end of the CSA of the seventh layer and the carry end of the CSA of the other of every two CSAs of the sixth layer are shifted to the left by 1 bit and input into the CSA of the eighth layer; the eighth layer has two CSAs in total, and the carry end of the second CSA is shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the ninth layer (1 in total); the carry terminal of the CSA of the ninth layer is shifted left by 1 bit, and the carry terminal of the first CSA of the eighth layer are shifted left by 1 to input the CSA of the tenth layer; the CSA carry end of the tenth layer is shifted to the left by 1 bit and the sum end is input into the CPA, and the result is input into the modulo addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.

The operation of the digital-to-analog addition module is shown in fig. 12, where CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "ROL 1-bit" represents a Carry end (Carry end) of the Carry save adder circularly shifted by 1 bit to the left. 32 operands are grouped into 11 groups, and the two groups of operands perform the same operation as follows: 1,2, 3 of 11 operands; 5. 6, 7; 9. 10, 11 respectively inputting three first-layer CSAs, the sum end of the first CSA in the first layer, the operand 4 and the carry end of the second CSA in the first layer are circularly shifted by 1 bit to the left to input the first CSA in the second layer, the operand 8, the carry end of the third CSA in the first layer are circularly shifted by 1 bit to the left and the sum end thereof are input into the second CSA in the second layer, the carry end of the first CSA in the first layer is circularly shifted by 1 bit to the left, the carry end of the first CSA in the second layer is circularly shifted by 1 bit to the left and the sum end thereof are input into the first CSA in the third layer, the sum end of the second CSA in the first layer, the carry end of the second CSA in the second layer are circularly shifted by 1 bit to the left and the sum end thereof are input into the third CSA in the third layer; the sum end of the first CSA in the third layer CSA and the carry end of the second CSA in the third layer are circularly shifted by 1 bit to the left and the sum end thereof are input into the fourth layer CSA; the carry end of the first CSA in the third layer CSA is circularly shifted by 1 bit to the left, the carry end of the fourth layer CSA is circularly shifted by 1 bit to the left and the sum end thereof is input into the fifth layer CSA. The sum end of the fifth-layer CSA of the first group and the carry end of the fifth-layer CSA of the second group are circularly shifted by 1 bit to the left and the sum end thereof are input into the sixth-layer CSA; circularly shifting the carry terminal of the fifth layer CSA of the first group by 1 bit to the left, circularly shifting the carry terminal of the sixth layer CSA by 1 bit to the left and inputting the carry terminal and the sum terminal of the sixth layer CSA into the seventh layer CSA; and the CSA carry end of the seventh layer circularly shifts 1 bit to the left and the sum end is input into the CPA, and the result is input into the modulo addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.

The 32-operation digital-to-analog addition module is shown in fig. 13, where CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "ROL 1-bit" represents that the Carry end (Carry end) of the Carry save adder is circularly shifted by 1 bit to the left. The operands with 4i, i being the operands in positions 1,2, …, and 8 are reserved in the 32 operands, and the rest operands are input into the first layer CSA every three; the carry end of the first layer CSA is shifted to the left by 1 bit and the sum end thereof, and the operand with the position of 4i, i is 1,2, …,8 is input into the second layer CSA; the sum end of every two second-layer CSAs and the carry end of one second-layer CSA are shifted left by 1 bit and input into a third-layer CSA; the carry end of the third layer CSA is shifted to the left by 1 bit, the sum end of the third layer CSA and the carry end of the other second layer CSA in every two second layer CSAs are shifted to the left by 1 bit and input into the fourth layer CSA; the sum end of every two fourth-layer CSAs and the carry end of one fourth-layer CSA are shifted to the left by 1 bit and input into the fifth-layer CSA; the carry end of the fifth layer CSA is shifted to the left by 1 bit, the sum end of the fifth layer CSA and the carry end of another one of every two fourth layer CSAs are shifted to the left by 1 bit and input into the sixth layer CSA; the sixth layer has two CSAs, the carry end of the second CSA is shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the seventh layer CSA (1 in total); the carry terminal of the CSA of the seventh layer is shifted left by 1 bit, and the carry terminal of the first CSA of the sixth layer are shifted left by 1 bit and input into the CSA of the eighth layer; the eight-layer CSA carry end is shifted left by 1 bit and the sum end is input into CPA, and the result is input into the modular addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.

The 24-operation digital-to-analog addition module is shown in fig. 14, wherein CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "ROL 1-bit" represents that the Carry end (Carry end) of the Carry save adder is circularly shifted by 1 bit to the left. 24 operands are input into the first layer CSA every three times, and the sum end of every two first layer CSAs and the carry end of one second layer CSA are circularly shifted to the left by 1 bit to be input into the second layer CSA; circularly shifting the carry end of the second layer CSA by 1 bit to the left, circularly shifting the sum end of the second layer CSA and the carry end of another first layer CSA in every two first layers CSA to the left by 1 and inputting the carry end into the third layer CSA; circularly shifting the sum end of every two third-layer CSAs and the carry end of one third-layer CSA to the left by 1 bit and inputting the sum end and the carry end into a fourth-layer CSA; the fourth CSA layer has two CSAs, the carry end of the second CSA is circularly shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the fourth CSA layer; circularly shifting the carry end of the fourth layer CSA by 1 bit to the left, circularly shifting the sum end of the fourth layer CSA by 1 bit to the left, and circularly shifting the carry end of another third layer CSA of every two third layers CSA by 1 bit to the left to input the fifth layer CSA; the fifth CSA has two CSAs in total, the carry end of the second CSA is circularly shifted by 1 bit to the left, the sum end of the second CSA and the sum end of the first CSA are input into the sixth CSA (1 in total); circularly shifting the carry terminal of the CSA of the sixth layer by 1 bit to the left, and circularly shifting the sum terminal of the CSA of the sixth layer and the carry terminal of the first CSA of the fifth layer by 1 bit to the left to input the CSA of the seventh layer; and the CSA carry end of the seventh layer circularly shifts 1 bit to the left and the sum end is input into the CPA, and the result is input into the modulo addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.

The module of modulus p realizes the modulus of the input data to prime number p.

Claims

1. A base 64 arithmetic circuit for number theory conversion multiplication is characterized in that 64 operand generating modules are provided, the number of the 64 operand generating modules is Xk, k is 0,1,2,.. and 63, each operand generating module comprises a dividing circuit, a merging circuit and a zero padding circuit, the dividing circuit carries out high-order zero padding on each of 64 input data and divides the input data into 22 words by taking 3 bits as a word, and the divided input data is x_n，mN is more than or equal to 0 and less than 64, m is more than or equal to 0 and less than 22, the merging circuit divides the input data into 64 × 22 words to form operand outputs, 1 output in the dividing circuits of the 64 output operand generation modules is 64 96-bit operands, 48 outputs are 22 192-bit operands, 3 outputs are 32 192-bit operands, and 12 outputs are 24 192-bit operand outputs, the zero padding circuit fills the vacant positions when the merging circuit outputs the operands with '0', and an operation digital and analog addition module performs modulo addition on the operands of the outputs of each operand generation module;

and the number of the first and second groups,

2. The radix-64 arithmetic circuitry for number theory transform multiplication of claim 1 wherein the operand generation module whose output is 64 96-bit operands is numbered X0, the last 22 words of each 96-bit operand being input data, the first 10 words being assigned zeros.

3. The radix-64 arithmetic circuit of claim 1 wherein the operand generation blocks outputting the 22 192-bit operands are numbered Xk, k being an odd number, and each operand OP_mComposed of 64 different input data x_n，mN is more than or equal to 0 and less than 64, the same word index m is used, m is more than or equal to 0 and less than 22, x is combined_n，mAt the lowest position of OP_mThe position (D) is calculated from 3 × (m + nk) (mod 192).

4. The radix-64 arithmetic circuitry of claim 1 wherein the operand generation modules outputting the 32-bit operands are numbered X16, X32 and X48, the 32 operands are divided into 16 groups of 2 operands each, OP0 and OP1 are one group, OP2 and OP3 are one group, and so on, the operands OP in each group_2jAnd OP_2j+1Consisting of 88 different input data x_n，mN is more than or equal to 4j and less than or equal to 4j +3, m is more than or equal to 0 and less than 22, and x_n，mAt the lowest position of OP_2jAnd OP_2j+1The position of (2) is calculated from 3 × (m + nk) (mod 192), x_n，mIs preferentially placed on OP_2jIn, e.g. OP_2jIf the position is already occupied, then put on OP_2j+1To the corresponding position in (a).

5. The radix-64 arithmetic circuitry for number-theoretic transform multiplications of claim 1 wherein the operand generation modules outputting 24 192-bit operands are numbered Xk except X0, X16, X32 and X48, k being an even number divisible by 4 or 8, the 24 operands being divided into 4 groups, OP0 to OP5 being a group, OP6 to OP11 being a group, and so on, the operands OP in each group being OP_6jTo OP_6j+5Composed of 352 different input data x_n，mN is more than or equal to 16j and less than or equal to 16j +15, m is more than or equal to 0 and less than 22, and x_n，mAt the lowest position of OP_6jTo OP_6j+5The position of (2) is calculated from 3 × (m + nk) (mod 192), x_n，mUsing 4 words or 8 words as cycle merge operand, preferablyIs firstly placed on OP_6jTo OP_6j+5The middle index is the smaller OP.

6. The radix-64 arithmetic circuitry for number theory transform multiplication of claim 1 wherein the operand generation module whose output is 22 192-bit operands is numbered Xk except X0, X16, X32 and X48, k being an even number not divisible by 4 or 8, 22 operands are divided into 2 groups, OP0 to OP10 are one group, OP11 to OP21 are one group, operands OP in each group_11jTo OP_11j+10Composed of 704 different input data x_n，mN is more than or equal to 32j and less than or equal to 32j +31, m is more than or equal to 0 and less than 22, and x_n，mAt the lowest position of OP_11jTo OP_11j+10The position of (2) is calculated from 3 × (m + nk) (mod 192), x_n，mUsing 2 words as the period to merge operands and preferentially placing them in OP_11jTo OP_11j+10The middle index is the smaller OP.