Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a base 64 arithmetic circuit for number theory conversion multiplication, which solves the problems of high power consumption and resource overhead of the base 64 arithmetic circuit.
The technical scheme of the invention is as follows: a radix-64 operation circuit for number-theoretic transform multiplication, comprising:
the operand generating module is provided with 64 operand generating modules, the number of the 64 operand generating modules is Xk, k is 0,1,2, … and 63, each operand generating module comprises a dividing circuit, a merging circuit and a zero padding circuit, the dividing circuit divides each of 64 input data into 22 words by taking 3 bits as one word after carrying out high-order zero padding on the input data, and the divided input data is xn,m,0≤n<64,0≤m<22, said merge circuit forming said input data divided into 64 × 22 words into operand outputs, 1 output of said dividing circuits of 64 said output operand generation modules being 64 96-bit operands, 48 outputs being 22 192-bit operands, 3 outputs being 32 192-bit operands, and 12 outputs being 24 192-bit operand outputs, said zero-padding circuit padding empty bits when said merge circuit outputs operands to "0";
the operation digital-analog addition module is used for performing modular addition on the operand output by each operand generation module;
and the number of the first and second groups,
the module of modulus p realizes that the data output by each operation modulus addition module is output after modulus of prime number p, and the prime number p is 264-232+1。
Further, the operand generation module whose output is 64 96-bit operands is numbered X0, the last 22 words of each 96-bit operand are the input data, and the first 10 words are assigned zeros.
Further, the operand generation module with 22 output 192-bit operands is numbered Xk, k is odd, and each operand OPmFrom 32 different input data xn,m,0≤n<64, using the same word index m,0 ≦ m<22 are combined to formn,mAt the lowest position of OPmThe position (D) is calculated from 3 × (m + nk) (mod 192).
Further, the operand generation module outputting the 32 192-bit operands is numbered X16, X32, and X48, the 32 operands are divided into 16 groups, each group of 2 operands, one group of OP0 and OP1, one group of OP2 and OP3, and so on, and the operands OP in each group2jAnd OP2j+1Consisting of 88 different input data xn,m,4j≤n≤4j+3,0≤m<22 are combined to formn,mAt the lowest position of OP2jAnd OP2j+1The position of (2) is calculated from 3 × (m + nk) (mod 192), xn,mIs preferentially placed on OP2jIn, e.g. OP2jIf the position is already occupied, then put on OP2j+1To the corresponding position in (a).
Further, the operand generation module outputting 24 192-bit operands is numbered Xk except X0, X16, X32 and X48, k is an even number divisible by 4 or 8, 24 operands are divided into 4 groups, OP0 to OP5 are one group, OP6 to OP11 are one group, and so on, and the operands OP in each group are OP6jTo OP6j+5Composed of 352 different input data xn,m,16j≤n≤16j+15,0≤m<22 are combined to formn,mAt the lowest position of OP6jTo OP6j+5The position of (2) is calculated from 3 × (m + nk) (mod 192), xn,m4 or 8 words are used as the period merging operand and are preferentially placed in OP6jTo OP6j+5The middle index is the smaller OP.
Further, the operand generation module with 22-bit operand outputs is numbered Xk except X0, X16, X32 and X48, k is not integer 4 or 8Dividing the even number, dividing the 22 operands into 2 groups, one group being OP 0-OP 10, one group being OP 11-OP 21, the operands OP in each group being OP11jTo OP11j+10Composed of 704 different input data xn,m,32j≤n≤32j+31,0≤m<22 are combined to formn,mAt the lowest position of OP11jTo OP11j+10The position of (2) is calculated from 3 × (m + nk) (mod 192), xn,mUsing 2 words as the period to merge operands and preferentially placing them in OP11jTo OP11j+10The middle index is the smaller OP.
The technical scheme provided by the invention has the advantages that:
by using the null bit of 'zero padding' after operand shift, merging operands of the radix 64 operation in the number theory transformation multiplication, merging the operands from 4096 to 1504 in the prior art, greatly reducing the calculation overhead and improving the calculation efficiency of the radix 64 operation.
Detailed Description
The present invention is further described in the following examples, which are intended to be illustrative only and not to be limiting as to the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which would occur to persons skilled in the art upon reading the present specification and which are intended to be within the scope of the present invention as defined in the appended claims.
The formula for the base 64 operation is as follows
Wherein k is not less than 0<64, p is a prime number, W64Is the 64 th unit root.
When prime number p is Solinas prime number, p is 264-232+1. This prime number supports efficient modulo operation: 2192modp=1,296modp=-1,264modp=232-1. Unit root W calculated by using the prime number64=23The characteristic of power of 2, the multiplication and addition operation can be conveniently converted into the shift and the modular addition operation, and the calculation complexity of the number theory conversion is reduced. Thus, the base 64 operation can be written as
Each x isn3 bits are taken as a basic unit and are divided into 22 words, which are called xn,m,0≤m<22。xnCan be expressed as
Where m denotes the mth word, xnHas a data width of 64 bits, xn,mIs 3 bits, xn,21Is 1 bit. After splitting the input data, the radix 64 operation can be written as follows, and shifted operands can be merged using "0 padding" to reduce the number of modulo added operands.
Please refer to fig. 1, the basic 64 operation circuit for number theory transform multiplication according to the present embodiment includes 64 operand generation modules from X0 to X63, an operation digital-analog addition module, and a modulo-p module, where the operation digital-analog addition module is divided into 64 operation digital-analog addition modules, 22 operation digital-analog addition modules, 32 operation digital-analog addition modules, and 24 operation digital-analog addition modules according to the number of input operands. 64-bit data input on the circuit structure are used as the input of each operand generation module, an operation digital-analog addition module is connected behind each operand generation module, and a modulo-p module is connected behind each operation digital-analog addition module.
The operand generation module comprises a dividing circuit, a merging circuit and a zero filling circuit, and sequentially divides, merges and fills zero into 64-bit data to form an operand. Referring to fig. 2 and 3, the dividing circuit divides each 64-bit input data xnIs filled with 0 bits to form 66 bits of data, and then divided into 22 words, each word containing 3 bits, and the 22 nd word is 1 bit because the highest 2 bits are filled with 0 bits. The data segmentation can be easily implemented with existing hardware with little hardware overhead.
The operand generation modules are numbered with Xk, k being 0,1,2, …,15, the merging circuits in each operand generation module are different, but may be divided into 4 groups by type, with the circuits within each group being similar.
Group one: x0, 1 in total; and a second group: k such as X1, X3, X5 and the like is odd, and the number of k is 8; and (3) group III: 3 of X16, X32 and X48; group four: xk, k being an even number divisible by 4 or 8, except for X0, X16, X32 and X48, of which there are 12 in total X4, X8 and X12; group five: the numbers of Xk, k except X0, X16, X32 and X48 are even numbers which cannot be completely divided by 4 or 8, and 16 in total are X2, X6, X10 and the like.
The data merge operation for each group is explained in groups as follows:
group one, the merge circuit of the X0 operand generation module.
The operands are in fact aligned input data. In other words, each operand is derived from 22 consecutive words of the segmented circuit output data. The merging circuit outputs 16 96-bit operands, each new 96-bit operand consisting of 32 words, the last 22 words being the input data, and the first 10 words being assigned zeros. As shown in FIG. 4, operand # j OPjHas 96 bits, isnPut in the low 66 bits, the high 30 bits are filled with zero, and the merging circuit is shown in fig. 5.
And the group two is a merging circuit of odd operand generation modules such as X1, X3, X5 and the like.
For the merging circuit of the Xk operand generation module with odd k, the inputs are 64-bit input data and the outputs are 22 192-bit operands. Each operand OPmComposed of 64 different data xn,m,0≤n<64, using the same word index m,0 ≦ m<22 are combined together. x is the number ofn,mAt the lowest position of OPmThe position of (D) is calculated by 3 × (m + nk) (mod 192). The following operand is output by taking X1 as an example:
the merge circuit of the X1 operand generation module merges the operands as shown in fig. 6. The merged operation has 22 operands, each operand is composed of 64 different data xn,m,0≤n<64, using the same word index m,0 ≦ m<22 are combined together. x is the number of0,0Is 3 × (0+0 × 1) (mod 192) ═ 0, x, in OP01,0Is 3 × (0+1 × 1) (mod 192) ═ 3, and x is in OP00,1Is 3 × (1+0 × 1) (mod 192) ═ 3, x, at the position of OP163,1The lowest bit of (2) is located at 3 × (1+63 × 1) (mod 192) ═ 0. the merging circuit of operand No. 0 OP0 in the X1 operand generation module is shown in fig. 7.
And the operands output by the merging circuits of the rest operand generation modules are analogized in turn.
Group three, the merge circuits of the X16, X32, and X48 operand generation modules.
The input is 64-bit input data and the output is 32 192-bit operands. The 32 operands are divided into 16 groups of 2 operands, one group being OP0 and OP1, one group being OP2 and OP3, and so on. Operands OP within each group2jAnd OP2j+1Consisting of 88 different data xn,m,4j≤n≤4j+3,0≤m<22 are combined together. x is the number ofn,mAt the lowest position of OP2jAnd OP2j+1Is calculated by 33 × (m + nk) (mod 192). xn,mIs preferentially placed on OP2jIn, e.g. OP2jIf the position is already occupied, then put on OP2j+1To the corresponding position in (a). The remaining slots are all filled with "0". Taking the merged circuit output data of the X16 operand generation module as an example, as shown in fig. 8, there are 16 sets of operands, each set including 2 merged operands. Each new 192-bit operand consists of 64 words, which come from 4 different input data.
Group four, Xk, k except X0, X16, X32, and X48, are merging circuits of even operand generation blocks divisible by 4 or 8.
For a merging circuit of an Xk operand generation module where k is an even number other than 0, 16, 32, or 48 that can be divisionally divided by 4 or 8, the input is 64-bit input data and the output is 24 192-bit operands. The 24 operands are divided into 4 groups of 6 operands, one group being OP 0-OP 5, one group being OP 6-OP 11, and so on. Operands OP within each group6jTo OP6j+5Composed of 352 different data xn,m,16j≤n≤16j+15,0≤m<22 are combined together. x is the number ofn,mAt the lowest position of OP6jTo OP6j+5Is calculated by 3 × (m + nk) (mod 192). xn,m4 or 8 words are used as the period merging operand and are preferentially placed in OP6jTo OP6j+5The middle index is the smaller OP. The remaining slots are all filled with "0". Taking the merged circuit output data of the X4 operand generation module as an example, as shown in fig. 9, there are 4 sets of operands, each set including 6 merged operands. First, theOne group comprises OP0 to OP 5; the second group comprises OP6 to OP11 and so on. Each new 192-bit operand consists of 64 words, which come from 16 different input data, each providing 4 or 8 consecutive words.
Group five, Xk, k except X0, X16, X32, and X48, is the merge circuit of even operand generation blocks that are not divisible by 4 or 8.
For a merging circuit of an Xk operand generation module where k is an even number other than 0, 16, 32, or 48 that is not divisible by 4 or 8, the input is 64-bit input data and the output is 22 192-bit operands. The 22 operands are divided into 2 groups of 11 operands, one group being OP 0-OP 10 and one group being OP 11-OP 21. Operands OP within each group11jTo OP11j+10Composed of 704 different input data xn,m,32j≤n≤32j+31,0≤m<22 are combined to formn,mAt the lowest position of OP11jTo OP11j+10The position of (C) is calculated from 3 × (m + nk) (mod 192), cn,mUsing 2 words as the period to merge operands and preferentially placing them in OP11jTo OP11j+10The middle index is the smaller OP. The remaining slots are all filled with "0". Taking the merged circuit output data of the X2 operand generation module as an example, as shown in fig. 10, there are 2 sets of operands, each set including 11 merged operands. The first group comprises OP0 to OP 10; the second group comprises OP11 to OP21 and so on. Each new 192-bit operand consists of 64 words, which come from 32 different input data, each providing 2 consecutive words.
The operand generation modules obtain different operand quantities according to the different groups of operand generation modules, and the operation digital-analog addition module comprises a 64 operation digital-analog addition module, a 22 operation digital-analog addition module, a 32 operation digital-analog addition module and a 24 operation digital-analog addition module.
The 64-operation digital-to-analog addition module is shown in fig. 11, wherein CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "< < 1" represents that the Carry end (Carry end) of the Carry save adder is shifted to the left by 1 bit. The 4i, i-1, 2, …,16 position operands are reserved in 64 operands, and the rest operands are input into the first layer CSA every three; the carry end of the first layer CSA is shifted to the left by 1 bit and the sum end thereof, and the operand with the position of 4i, i being 1,2, …,16 is input into the second layer CSA; the sum end of every two second-layer CSAs and the carry end of one second-layer CSA are shifted left by 1 bit and input into a third-layer CSA; the carry end of the third layer CSA is shifted to the left by 1 bit, the sum end of the third layer CSA and the carry end of the other second layer CSA in every two second layer CSAs are shifted to the left by 1 bit and input into the fourth layer CSA; the sum end of every two fourth-layer CSAs and the carry end of one fourth-layer CSA are shifted to the left by 1 bit and input into the fifth-layer CSA; the carry end of the fifth layer CSA is shifted to the left by 1 bit, the sum end of the fifth layer CSA and the carry end of another one of every two fourth layer CSAs are shifted to the left by 1 bit and input into the sixth layer CSA; shifting the sum end of every two sixth-layer CSAs and the carry end of one sixth-layer CSA to the left by 1 bit and inputting the sum end and the carry end into a seventh-layer CSA; the carry end of the CSA of the seventh layer is shifted to the left by 1 bit, the sum end of the CSA of the seventh layer and the carry end of the CSA of the other of every two CSAs of the sixth layer are shifted to the left by 1 bit and input into the CSA of the eighth layer; the eighth layer has two CSAs in total, and the carry end of the second CSA is shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the ninth layer (1 in total); the carry terminal of the CSA of the ninth layer is shifted left by 1 bit, and the carry terminal of the first CSA of the eighth layer are shifted left by 1 to input the CSA of the tenth layer; the CSA carry end of the tenth layer is shifted to the left by 1 bit and the sum end is input into the CPA, and the result is input into the modulo addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.
The operation of the digital-to-analog addition module is shown in fig. 12, where CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "ROL 1-bit" represents a Carry end (Carry end) of the Carry save adder circularly shifted by 1 bit to the left. 32 operands are grouped into 11 groups, and the two groups of operands perform the same operation as follows: 1,2, 3 of 11 operands; 5. 6, 7; 9. 10, 11 respectively inputting three first-layer CSAs, the sum end of the first CSA in the first layer, the operand 4 and the carry end of the second CSA in the first layer are circularly shifted by 1 bit to the left to input the first CSA in the second layer, the operand 8, the carry end of the third CSA in the first layer are circularly shifted by 1 bit to the left and the sum end thereof are input into the second CSA in the second layer, the carry end of the first CSA in the first layer is circularly shifted by 1 bit to the left, the carry end of the first CSA in the second layer is circularly shifted by 1 bit to the left and the sum end thereof are input into the first CSA in the third layer, the sum end of the second CSA in the first layer, the carry end of the second CSA in the second layer are circularly shifted by 1 bit to the left and the sum end thereof are input into the third CSA in the third layer; the sum end of the first CSA in the third layer CSA and the carry end of the second CSA in the third layer are circularly shifted by 1 bit to the left and the sum end thereof are input into the fourth layer CSA; the carry end of the first CSA in the third layer CSA is circularly shifted by 1 bit to the left, the carry end of the fourth layer CSA is circularly shifted by 1 bit to the left and the sum end thereof is input into the fifth layer CSA. The sum end of the fifth-layer CSA of the first group and the carry end of the fifth-layer CSA of the second group are circularly shifted by 1 bit to the left and the sum end thereof are input into the sixth-layer CSA; circularly shifting the carry terminal of the fifth layer CSA of the first group by 1 bit to the left, circularly shifting the carry terminal of the sixth layer CSA by 1 bit to the left and inputting the carry terminal and the sum terminal of the sixth layer CSA into the seventh layer CSA; and the CSA carry end of the seventh layer circularly shifts 1 bit to the left and the sum end is input into the CPA, and the result is input into the modulo addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.
The 32-operation digital-to-analog addition module is shown in fig. 13, where CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "ROL 1-bit" represents that the Carry end (Carry end) of the Carry save adder is circularly shifted by 1 bit to the left. The operands with 4i, i being the operands in positions 1,2, …, and 8 are reserved in the 32 operands, and the rest operands are input into the first layer CSA every three; the carry end of the first layer CSA is shifted to the left by 1 bit and the sum end thereof, and the operand with the position of 4i, i is 1,2, …,8 is input into the second layer CSA; the sum end of every two second-layer CSAs and the carry end of one second-layer CSA are shifted left by 1 bit and input into a third-layer CSA; the carry end of the third layer CSA is shifted to the left by 1 bit, the sum end of the third layer CSA and the carry end of the other second layer CSA in every two second layer CSAs are shifted to the left by 1 bit and input into the fourth layer CSA; the sum end of every two fourth-layer CSAs and the carry end of one fourth-layer CSA are shifted to the left by 1 bit and input into the fifth-layer CSA; the carry end of the fifth layer CSA is shifted to the left by 1 bit, the sum end of the fifth layer CSA and the carry end of another one of every two fourth layer CSAs are shifted to the left by 1 bit and input into the sixth layer CSA; the sixth layer has two CSAs, the carry end of the second CSA is shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the seventh layer CSA (1 in total); the carry terminal of the CSA of the seventh layer is shifted left by 1 bit, and the carry terminal of the first CSA of the sixth layer are shifted left by 1 bit and input into the CSA of the eighth layer; the eight-layer CSA carry end is shifted left by 1 bit and the sum end is input into CPA, and the result is input into the modular addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.
The 24-operation digital-to-analog addition module is shown in fig. 14, wherein CSA represents a Carry save adder, CPA represents a ripple Carry adder, and "ROL 1-bit" represents that the Carry end (Carry end) of the Carry save adder is circularly shifted by 1 bit to the left. 24 operands are input into the first layer CSA every three times, and the sum end of every two first layer CSAs and the carry end of one second layer CSA are circularly shifted to the left by 1 bit to be input into the second layer CSA; circularly shifting the carry end of the second layer CSA by 1 bit to the left, circularly shifting the sum end of the second layer CSA and the carry end of another first layer CSA in every two first layers CSA to the left by 1 and inputting the carry end into the third layer CSA; circularly shifting the sum end of every two third-layer CSAs and the carry end of one third-layer CSA to the left by 1 bit and inputting the sum end and the carry end into a fourth-layer CSA; the fourth CSA layer has two CSAs, the carry end of the second CSA is circularly shifted to the left by 1 bit, the sum end of the second CSA and the sum end of the first CSA are input into the fourth CSA layer; circularly shifting the carry end of the fourth layer CSA by 1 bit to the left, circularly shifting the sum end of the fourth layer CSA by 1 bit to the left, and circularly shifting the carry end of another third layer CSA of every two third layers CSA by 1 bit to the left to input the fifth layer CSA; the fifth CSA has two CSAs in total, the carry end of the second CSA is circularly shifted by 1 bit to the left, the sum end of the second CSA and the sum end of the first CSA are input into the sixth CSA (1 in total); circularly shifting the carry terminal of the CSA of the sixth layer by 1 bit to the left, and circularly shifting the sum terminal of the CSA of the sixth layer and the carry terminal of the first CSA of the fifth layer by 1 bit to the left to input the CSA of the seventh layer; and the CSA carry end of the seventh layer circularly shifts 1 bit to the left and the sum end is input into the CPA, and the result is input into the modulo addition module. The modular addition module realizes the addition operation of 193-bit width data, low 192-bit data and 193-th data, and the output result is congruent with the input data pair prime number p.
The module of modulus p realizes the modulus of the input data to prime number p.