CN103186360A - Fast arithmetic multi-bit serial pulse dual-base binary finite field multiplier - Google Patents
Fast arithmetic multi-bit serial pulse dual-base binary finite field multiplier Download PDFInfo
- Publication number
- CN103186360A CN103186360A CN2013101154017A CN201310115401A CN103186360A CN 103186360 A CN103186360 A CN 103186360A CN 2013101154017 A CN2013101154017 A CN 2013101154017A CN 201310115401 A CN201310115401 A CN 201310115401A CN 103186360 A CN103186360 A CN 103186360A
- Authority
- CN
- China
- Prior art keywords
- module
- input
- result
- xor
- frrp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004364 calculation method Methods 0.000 claims abstract description 46
- 238000009825 accumulation Methods 0.000 claims abstract description 5
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 claims description 5
- 230000010349 pulsation Effects 0.000 claims description 5
- 102100029469 WD repeat and HMG-box DNA-binding protein 1 Human genes 0.000 claims description 3
- 101710097421 WD repeat and HMG-box DNA-binding protein 1 Proteins 0.000 claims description 3
- 101100346151 Escherichia coli (strain K12) modF gene Proteins 0.000 description 13
- 239000011159 matrix material Substances 0.000 description 8
- 239000013598 vector Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000010354 integration Effects 0.000 description 2
- 230000008602 contraction Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Landscapes
- Complex Calculations (AREA)
Abstract
本发明涉及一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是,B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入,B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是,B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与前面(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果。
The invention relates to a multi-bit serial pulsating double-base binary finite-field multiplier with fast operation, comprising an input terminal B, k PE modules, an FRRP module, and an R3 module, wherein the k PE modules are connected in series, and the k PE modules After k cycles, the input of the first cycle A is , B is directly input, and the calculation result is restored and input to the temporary register C through the FRRP module; the input of A in the second cycle , B is input through the R3 module, and the calculation result is also restored by the FRRP module, added to the calculation result of the first cycle, and stored in the temporary register C; thus, in the kth cycle, the input of A is , B is input after ( k -1) times of the R3 module, the calculation result is restored by the FRRP module, added to the previous ( k -1) accumulation results, saved in the temporary register C, and then stored by the temporary C outputs the result.
Description
技术领域technical field
本发明涉及一种二进制有限域乘法器,尤其涉及一种快速运算多位元串联脉动双基底二进制有限域乘法器。The invention relates to a binary finite field multiplier, in particular to a fast operation multi-bit series pulse double base binary finite field multiplier.
背景技术Background technique
近年来,椭圆曲线密码学(ECC,Elliptic curve cryptography)[1],[2]已经被与密码学的研究联系起来。随着椭圆曲线密码学在公钥密码系统中的出现,一些硬件实现的问题在ECC的应用上被提了出来。NIST推荐了5个二位元场,GF(2163),GF(2233),GF(2283),GF(2409),and GF(2571)。在基于ECC基底的密码协议中,有现场乘法是计算点成的必不可少的元素。密码系统硬件的有效性通常影响面积,能量消耗,以及性能表现。In recent years, Elliptic Curve Cryptography (ECC, Elliptic curve cryptography) [1], [2] has been associated with the study of cryptography. With the emergence of elliptic curve cryptography in public key cryptosystems, some hardware implementation issues have been raised in the application of ECC. NIST recommends five binary fields, GF(2 163 ), GF(2 233 ), GF(2 283 ), GF(2 409 ), and GF(2 571 ). In cryptographic protocols based on ECC substrates, on-site multiplication is an essential element for computing point scores. The availability of cryptosystem hardware generally affects area, power consumption, and performance.
对于高速大规模集成电路(VLSI,very-large-scale integration)的实现,心脏收缩阵列结构是更佳的选择。在扩展的二位元场中,多种有效的心脏收缩阵列乘法器已经被设计并且可以被归类为位并行和为串联机构。有效的位并行心脏收缩乘法器通常采用LSB优先或是MSB优先算法。位并行心脏收缩乘法器的主要优点是整个计算过程中的贯通性。然而,这些结构对基于二位元场的多项式需要O(m2)XOR,O(m2)AND,O(m2)一位的锁存器和O(m)的延迟复杂度。为了减少时间和空间复杂度,LEE[8],[9],[13]算法展示了有现场乘法对于一些特殊多项式,例如全一多项式,五项多项式,三项多项式,都可以用Toeplitz矩阵向量乘法(TMVP,Toeplitz matrix-vector product)去建立满为并行心脏收缩乘法器。位串联心脏收缩阵列乘法器需要O(m)的空间复杂度,但他们导致了更长的计算延迟。For the realization of high-speed large-scale integration (VLSI, very-large-scale integration), the systolic array structure is a better choice. In the extended binary field, a variety of efficient systolic array multipliers have been designed and can be classified as bit-parallel and as serial mechanisms. Efficient bit-parallel systolic multipliers typically use LSB-first or MSB-first algorithms. The main advantage of the bit-parallel systolic multiplier is the continuity throughout the computation. However, these structures require O(m 2 ) XOR, O(m 2 ) AND, O(m 2 ) one-bit latches and O(m) delay complexity for polynomials based on two-bit fields. In order to reduce the time and space complexity, the LEE[8],[9],[13] algorithms show that there are field multiplications. For some special polynomials, such as all-one polynomials, five-term polynomials, and three-term polynomials, Toeplitz matrix vectors can be used Multiplication (TMVP, Toeplitz matrix-vector product) to build fully parallel systolic multipliers. Bit-concatenated systolic array multipliers require O(m) space complexity, but they incur longer computation delays.
为了时间复杂度和空间复杂度的一个折中,在为并列和为串联乘法器之间,数字串列心脏收缩乘法器已经被公开。数字串列转换多项式基底乘法器,基于内部是数字,外部是并行的结构被在[20]中被提出。在这样的乘法器里,元素域长中m位可以再分成个d位长的子段。在每个时钟周期里,d位的字串计算出来并且一个m位的乘法计算出来了。一个可扩展的和心脏收缩的乘法器使用一个固有的d*d位的平行的汉克向量矩阵已经在[15],[16]提出来它的延迟是个时钟周期。多位元串联脉动乘法器内部和外部使用不同的结构在文献中呈现。这些乘法器的延迟是时钟周期。如前面所提到的,低复杂度的心脏收缩有限域乘法器的设计依靠于不可约多项式的选择和表现基底的选择,这些数字串联乘法器需要高延时去实现乘法计算。For a trade-off between time complexity and space complexity, between parallel and serial multipliers, digital serial systolic multipliers have been disclosed. A serial-to-digital polynomial base multiplier, based on a digital-inner, parallel-outer structure, was proposed in [20]. In such a multiplier, the m bits of the element field length can be subdivided into A subsection of d bits long. In each clock cycle, a d-bit string is computed and an m-bit multiplication is computed. A scalable and systolic multiplier using an inherent d*d-bit parallel Hank vector matrix has been proposed in [15], [16] whose delay is clock cycle. Multi-bit cascaded systolic multipliers using different structures internally and externally are presented in the literature. The delay of these multipliers is clock cycle. As mentioned earlier, the design of low-complexity systolic finite-field multipliers relies on the choice of irreducible polynomials and the choice of representational bases, and these digital cascade multipliers require high latency for multiplication computations.
发明内容Contents of the invention
本发明解决的技术问题是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,克服现有乘法器需要高延时去实现乘法计算的技术问题。The technical problem solved by the invention is to construct a multi-bit serial pulsation double-base binary finite-field multiplier with fast operation, and overcome the technical problem that the existing multiplier requires high delay to realize multiplication calculation.
本发明的技术方案是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是A0、A1、...、Ak-1,B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入Ak、Ak+1、…、A2k-1,B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与所述(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,所述R3模块实现BxkdmodF(x)的计算,所述PE模块包括R1模块、CMP模块、CVP模块、PWM模块、个异或门、和个锁存器,所述R3模块输出到所述R1模块后经所述CMP模块进行系数转换,A的分段输入所述CVP模块进行A的分段的系数转换,CMP模块和CVP模块的计算结果均输入到PWM模块,实现Bin和A分段乘积计算,经过个异或门累加,结果保存在个锁存器中,由锁存器输出结果其中,A通过三项多项式F(x)=1+xn+xm,表示为A=a0+a1x+...+am-1xm-1,共有m个系数,即(a0,a1,...,am-1)。使用分段切割法,将m位的A切割成每段d位,总共有k2个分段,因此有B通过双基底可表示为B=b0β0+b1β1+...+bm-1βm-1,作为乘法器的另一个输入;C为输出结果。The technical scheme of the present invention is: build a kind of rapid operation multi-bit serial pulsation double-base binary finite-field multiplier, including input terminal B, k PE modules, FRRP module, R3 module, described k PE modules are connected in series, so The above k PE modules go through k cycles, the input of A in the first cycle is A 0 , A 1 , ..., A k-1 , B is directly input, and the calculation result is restored and input to the temporary register by the FRRP module In C; the input A k , A k+1 , ..., A 2k-1 of A in the second cycle, B is input through the R3 module, and the calculation result is also restored by the FRRP module, which is the same as the calculation result of the first cycle Add and store in register C; thus, in the kth cycle, the input of A is B is input after (k-1) times of the R3 module, the calculation result is restored by the FRRP module, added to the (k-1) accumulation result, saved in the temporary register C, and then stored by the temporary Device C output result, described R3 module realizes the calculation of Bx kd modF (x), and described PE module comprises R1 module, CMP module, CVP module, PWM module, XOR gates, and A latch, after the R3 module is output to the R1 module, the coefficient conversion is performed by the CMP module, and the subsection of A is input to the CVP module to perform the subsection coefficient conversion of A, and the calculation of the CMP module and the CVP module The results are all input to the PWM module to realize the calculation of the segmental product of B in and A, after XOR gates are accumulated, and the result is stored in of latches, by Latch output result Among them, A is expressed as A=a 0 +a 1 x+...+a m-1 x m-1 through a three-term polynomial F(x)=1+x n +x m , and there are m coefficients in total, namely ( a 0 ,a 1 ,...,a m-1 ). Using the segmented cutting method, the m-bit A is cut into d bits per segment, there are k 2 segments in total, so there are B can be expressed as B=b 0 β 0 +b 1 β 1 +...+b m-1 β m-1 through the double basis, as another input of the multiplier; C is the output result.
本发明的进一步技术方案是:所述FRRP模块包括FR模块、R2模块,所述R2模块实现Cmod(xm+1)的计算,所述FR模块的输入是k个串联PE模块的计算结果,对结果进行还原,输出到R2模块。The further technical scheme of the present invention is: described FRRP module comprises FR module, R2 module, and described R2 module realizes the calculation of Cmod (x m +1), and the input of described FR module is the calculation result of k serial PE modules, The result is restored and output to the R2 module.
本发明的进一步技术方案是:所述CMP模块包括异或门XOR_1和XOR_2,所述异或门XOR_1和XOR_2并联。A further technical solution of the present invention is: the CMP module includes exclusive OR gates XOR_1 and XOR_2, and the exclusive OR gates XOR_1 and XOR_2 are connected in parallel.
本发明的进一步技术方案是:所述CVP模块为异或门XOR_3。A further technical solution of the present invention is: the CVP module is an exclusive OR gate XOR_3.
本发明的进一步技术方案是:所述PWM模块包括三个并联的与门AND_1、AND_2和AND_3。将所述CMP模块和所述CVP模块输出的结果进行点对点相乘。A further technical solution of the present invention is: the PWM module includes three parallel AND gates AND_1, AND_2 and AND_3. Perform point-to-point multiplication on the results output by the CMP module and the CVP module.
本发明的进一步技术方案是:所述FR模块包括两个并联的异或门XOR_4和XOR_5。A further technical solution of the present invention is: the FR module includes two parallel-connected XOR gates XOR_4 and XOR_5.
本发明的技术效果是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是(A0,A1,…Ak-1),B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入(Ak,Ak+1,…,A2k-1),B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与前面(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,本发明结合多项式基底和MPB去建立双基底乘法。一些有现场乘法能够得到在位并行结构里通过次子空间TMVP得到。在二位元场GF(2m),不可分解三项多项式和五项多项式被广泛的应用在密码领域,在这样的领域中位长通常比较大。本发明中通过一种新的数字串联新站收缩双基底乘法器通过使用次二次TMVP公式,一旦一个d*d的Toeplitz乘法被选择了,被提出的结构能去的非常低的时钟周期。The technical effect of the present invention is: construct a kind of multi-bit serial pulsating double-base binary finite-field multiplier of fast operation, comprise input terminal B, k PE modules, FRRP module, R3 module, described k PE modules are connected in series, so The above k PE modules go through k cycles, the input of A in the first cycle is (A 0 , A 1 ,...A k-1 ), B is directly input, and the calculation result is restored and input to the temporary register C by the FRRP module Middle; the input of A in the second period (A k ,A k+1 ,...,A 2k-1 ), B is input through the R3 module, and the calculation result is also restored by the FRRP module, which is the same as the calculation result of the first period Added and saved in the temporary register C; thus, in the kth cycle, the input of A is B is input after (k-1) times of the R3 module, the calculation result is restored by the FRRP module, added to the previous (k-1) accumulation results, saved in the temporary register C, and then transferred to the temporary register C output results, the present invention combines polynomial basis and MPB to establish double basis multiplication. Some in-field multiplications can be obtained by subsubspace TMVP in bit-parallel architectures. In the two-bit field GF(2 m ), indecomposable trinomial polynomials and pentanomial polynomials are widely used in the field of cryptography, and the bit length is usually relatively large in such fields. In the present invention, by using a new digital cascaded new-station contraction double-base multiplier by using the sub-quadratic TMVP formula, once a d*d Toeplitz multiplication is selected, the proposed structure can go to very low clock cycle.
附图说明Description of drawings
图1为本发明的结构示意图。Fig. 1 is a structural schematic diagram of the present invention.
图2为本发明多位元串联脉动乘法器结构图。FIG. 2 is a structural diagram of a multi-bit serial systolic multiplier of the present invention.
图3为本发明处理单元PE的结构图。Fig. 3 is a structural diagram of the processing unit PE of the present invention.
图4为本发明PE模块的具体电路图。Fig. 4 is a specific circuit diagram of the PE module of the present invention.
具体实施方式Detailed ways
下面结合具体实施例,对本发明技术方案进一步说明。The technical solutions of the present invention will be further described below in conjunction with specific embodiments.
如图2所示,本发明的具体实施方式是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是A0、A1、…、Ak-1,B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入Ak、Ak+1、…、A2k-1,B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与所述(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,所述R3模块实现BxkdmodF(x)的计算,所述PE模块包括R1模块、CMP模块、CVP模块、PWM模块、个异或门、和个锁存器,所述R3模块输出到所述R1模块后经所述CMP模块进行系数转换,A的分段输入所述CVP模块进行A的分段的系数转换,CMP模块和CVP模块的计算结果均输入到PWM模块,实现Bin和A分段乘积计算,经过个异或门累加,结果保存在个锁存器中,由锁存器输出结果其中,A通过三项多项式F(x)=1+xn+xm,表示为A=a0+a1x+...+am-1xm-1,共有m个系数,即(a0,a1,...,am-1)。使用分段切割法,将m位的A切割成每段d位,总共有k2个分段,因此有B通过双基底可表示为B=b0β0+b1β1+...+bm-1βm-1,作为乘法器的另一个输入;C为输出结果。As shown in Fig. 2, the specific embodiment of the present invention is: build a kind of multi-bit serial pulsation double-base binary finite-field multiplier of fast operation, comprise input terminal B, k PE modules, FRRP module, R3 module, described K PE modules are connected in series, the k PE modules go through k cycles, the input of A in the first cycle is A 0 , A 1 ,..., A k-1 , B is directly input, and the calculation result is restored by the FRRP module Input to the temporary register C; the input A k , A k+1 , ..., A 2k-1 of the second cycle A, B is input through the R3 module, and the calculation result is also restored by the FRRP module, which is the same as the first The calculation results of the cycle are added and stored in the temporary register C; thus, in the kth cycle, the input of A is B is input after (k-1) times of the R3 module, the calculation result is restored by the FRRP module, added to the (k-1) accumulation result, saved in the temporary register C, and then stored by the temporary Device C output result, described R3 module realizes the calculation of Bx kd modF (x), and described PE module comprises R1 module, CMP module, CVP module, PWM module, XOR gates, and A latch, after the R3 module is output to the R1 module, the coefficient conversion is performed by the CMP module, and the subsection of A is input to the CVP module to perform the subsection coefficient conversion of A, and the calculation of the CMP module and the CVP module The results are all input to the PWM module to realize the calculation of the segmental product of B in and A, after XOR gates are accumulated, and the result is stored in of latches, by Latch output result Among them, A is expressed as A=a 0 +a 1 x+...+a m-1 x m-1 through a three-term polynomial F(x)=1+x n +x m , and there are m coefficients in total, namely ( a 0 ,a 1 ,...,a m-1 ). Using the segmented cutting method, the m-bit A is cut into d bits per segment, there are k 2 segments in total, so there are B can be expressed as B=b 0 β 0 +b 1 β 1 +...+b m-1 β m-1 through the double basis, as another input of the multiplier; C is the output result.
本发明的优选实施方式是:所述FRRP模块包括FR模块、R2模块,所述R2模块实现Cmod(xm+1)的计算,所述FR模块的输入是k个串联PE模块的计算结果,对结果进行还原,输出到R2模块。A preferred embodiment of the present invention is: the FRRP module includes an FR module and an R2 module, the R2 module realizes the calculation of Cmod(x m +1), and the input of the FR module is the calculation result of k series PE modules, The result is restored and output to the R2 module.
CMP模块和CVP模块的输入分别是Bin和其输出结果都作为PWM模块的输入,PWM模块的输出经过个异或门,和个锁存器,输出结果R1模块的输入是Bin,其输出经过m个锁存器,输出结果Bout。CMP模块的输入是Bxdk(i+1)+jd,输出是[B(p+q),B(p+q+1),...,B(p+q+d-1)],CVP模块的输入是Aik+j,输出的是[aq,aq+1,...,aq+d-1]T,其中表示排列成矩阵的行数和列数,i,j=0,1,...,k-1,i表示矩阵的第i行,j表示矩阵的第j列,p表示dk(i+1)+jd,q表示(ik+j)d,T表示[aq,aq+1,...,aq+d-1]矩阵的转置。其输出结果与上一个FRRP模块的结果进行累加,并输出到下一个FRRP模块。The inputs of the CMP module and the CVP module are B in and The output results are all used as the input of the PWM module, and the output of the PWM module is passed through XOR gates, and latch, the output result The input of the R1 module is B in , its output passes through m latches, and the output result is B out . The input of the CMP module is Bx dk(i+1)+jd , and the output is [B (p+q) ,B( p+q+1) ,...,B (p+q+d-1) ], The input of the CVP module is A ik+j , and the output is [a q ,a q+1 ,...,a q+d-1 ] T , where express The number of rows and columns arranged into a matrix, i,j=0,1,...,k-1, i represents the i-th row of the matrix, j represents the j-th column of the matrix, and p represents dk(i+1) +jd, q means (ik+j)d, T means the transpose of [a q ,a q+1 ,...,a q+d-1 ] matrix. The output result is accumulated with the result of the previous FRRP module and output to the next FRRP module.
图1脉动阵列双基底乘法器结构中展示了整个双基底乘法的结构,A,B,C是三个在GF(2m)中的元素,由不可分解三项多项式F(x)=1+xn+xm组成,其中,n≤m/2。元素A由多项式基底表示法表示,B和C用双基底表示法表示,整个乘法器实现C=ABmodF(x)功能,其中A、B作为输入,C为输出结果。A通过三项多项式F(x)=1+xn+xm,表示为A=a0+a1x+...+am-1xm-1,共有m个系数,即(a0,a1,...,am-1)。使用分段切割法,将m位的A切割成每段d位,总共有k2个分段,因此有每个分段Ai可表示为Ai=aid+aid+1x+…+aid+d-1xd-1,所有分段代替A作为整个乘法器的输入。B通过双基底可表示为B=b0β0+b1β1+...+bm-1βm-1,作为乘法器的另一个输入。C为输出结果,由C=ABmodF(x)计算得到,即整个乘法器实现的功能。The structure of the entire double-base multiplication is shown in the structure of the systolic array double-base multiplier in Fig. 1. A, B, and C are three elements in GF(2 m ), and the non-decomposable trinomial polynomial F(x)=1+ x n + x m , where n≤m/2. Element A is represented by polynomial basis notation, B and C are represented by double basis notation, and the whole multiplier realizes the function of C=ABmodF(x), where A and B are used as input, and C is the output result. A passes the three-term polynomial F(x)=1+x n +x m , expressed as A=a 0 +a 1 x+...+a m-1 x m-1 , there are m coefficients in total, namely (a 0 ,a 1 ,...,a m-1 ). Using the segmented cutting method, the m-bit A is cut into d bits per segment, there are k 2 segments in total, so there are Each segment Ai can be expressed as A i =a id +a id+1 x+…+a id+d-1 x d-1 , all segments instead of A as the input to the entire multiplier. B can be expressed as B=b 0 β 0 +b 1 β 1 +...+b m-1 β m-1 through the double basis, as another input of the multiplier. C is the output result, which is calculated by C=ABmodF(x), that is, the function realized by the entire multiplier.
由于A被分割成所以A可表示为
其中
图1整个乘法器结构中,第1行计算的是C0=B(A0+A1xd+…+Ak-1x(k-1)d),其第1个处理单元PE0,0计算BA0乘积结果,第2个处理单元PE0,1计算BA1xd乘积结果,以此类推,第k个处理单元PE0,k-1计算BAk-1x(k-1)d乘积结果。整个k个处理单元计算结果再累加最终得到C0,输入到第1个FRRP(FinalReconstruction-Reduction-Polynomial)模块。同样地,整个乘法器结构的第2行计算的是C1=Bxdk(Ak+Ak+1x d+…+A2k-1x(k-1)d),增加的R3模块式计算BxdkmodF(x),其输入是B。其第1个处理单元PE1,0计算BxdxA0乘积结果,后续与第1行类似,计算所得结果C1,输入到第2个FRRP模块,与第1个FRRP模块累加得到(C0+C1)modF(x)。整个乘法器的每行都进行类似计算,一直计算到第k行,其R3模块的输出结果为Bxdk(k-1)modF(x),第k个FRRP模块输入为Ck-1,输出为(C0+C1+…+Ck-1)modF(x),即为整个乘法器运算结果C=(C0+C1+…+Ck-1)modF(x)。In the entire multiplier structure in Figure 1, the first row calculates C 0 =B(A 0 +A 1 x d +…+A k-1 x (k-1)d ), and its first processing unit PE 0 ,0 calculates the product result of BA 0 , the second processing unit PE 0,1 calculates the product result of BA 1 x d , and so on, the kth processing unit PE 0,k-1 calculates BA k-1 x (k-1 )d product result. The calculation results of the entire k processing units are accumulated and finally C 0 is obtained, which is input to the first FRRP (FinalReconstruction-Reduction-Polynomial) module. Similarly, the second row of the entire multiplier structure calculates C 1 =Bx dk (A k +A k+1x d +…+A 2k-1 x (k-1)d ), the increased R3 modular calculation Bx dk modF(x), whose input is B. Its first processing unit PE 1,0 calculates the product result of Bx dx A 0 , and the follow-up is similar to the first row. The calculated result C 1 is input to the second FRRP module and accumulated with the first FRRP module to obtain (C 0 +C 1 ) mod F(x). Each row of the entire multiplier performs similar calculations until the kth row, the output of the R3 module is Bx dk(k-1) modF(x), the input of the kth FRRP module is C k-1 , and the output It is (C 0 +C 1 +...+C k-1 )modF(x), that is, the operation result of the entire multiplier C=(C 0 +C 1 +...+C k-1 )modF(x).
每个处理单元PEi,j的详细电路如图2所示,用于计算Bxdk(i+1)+jdAik+j乘积结果。Ain、Bin和作为输入,Bout和作为输出。对每行的第1个处理单元PEi,0,其Ain输入的是Aik,Bin是由第i+1个R3模块的输出,即为Bxdk(i+1)modF(x),而初始化为0。Bout作为R1的输出,也是第2个处理单元PEi,1的输入,输出的结果为Bxdk(i+1)+dmodF(x)。输出的是的结果,即计算Bxdk(i+1)Aik乘积结果。每行的第2个处理单元PEi,1,其Ain输入的是Aik+1,Bin输入的是Bxdk(i+1)+dmodF(x),输入的是第1个处理单元PEi,0计算结果,即为Bxdk(i+1)Aik,作为第3个处理单元PEi,1的输入Bout输出的是Bxdk(i+1)+2dmodF(x)计算结果,作为第3个处理单元PEi,1的输入Bin,输出的是Bxdk(i+1)+dAik+1乘积结果。以此类推,每行第j+1个处理单元PEi,j计算的是Bxdk(i+1)+jdAik+j乘积结果,其Ain输入的是Aik+j,Bin输入的是Bxdk(i+1)+jdmodF(x),输入的是第j个模块的输出结果,为Bxdk(i+1)+(j-1)dAik+(j-1),Bout输出的是Bxdk(i+1)+(j+1)dmodF(x)计算结果,输出的是Bxdk(i+1)+jdAik+j乘积结果。The detailed circuit of each processing unit PEi,j is shown in Fig. 2, which is used to calculate the product result of Bx dk(i+1)+jd A ik+j . A in , B in and As input, B out and as output. For the first processing unit PE i,0 in each row, its A in input is A ik , and B in is the output of the i+1th R3 module, which is Bx dk(i+1) modF(x) ,and Initialized to 0. B out is the output of R1 and the input of the second processing unit PE i,1 , and the output result is Bx dk(i+1)+d modF(x). The output is The result of calculating Bx dk(i+1) A ik product result. The second processing unit PE i,1 in each row, its A in input is A ik+1 , and its B in input is Bx dk(i+1)+d modF(x), The input is the calculation result of the first processing unit PE i,0 , which is Bx dk(i+1) A ik , which is used as the input of the third processing unit PE i,1 The output of B out is the calculation result of Bx dk(i+1)+2d modF(x), which is used as the input B in of the third processing unit PE i,1 , The output is the product result of Bx dk(i+1)+d A ik+1 . By analogy, the j+1th processing unit PE i, j in each row calculates the product result of Bx dk(i+1)+jd A ik+j , the A in input is A ik+j , and the B in input is Bx dk(i+1)+jd modF(x), The input is the jth module's The output result is Bx dk(i+1)+(j-1)d A ik+(j-1) , B out output is Bx dk(i+1)+(j+1)d modF(x) calculation result, The output is the product result of Bx dk(i+1)+jd A ik+j .
将Bxdk(i+1)+jd和Aik+j分别展开,即Bxdk(i+1)+jd=(b0β0+b1β1+…+bm-1βm-1)xdk(i+1)+jd,Aik+j=a(ik+j)d+a(ik+j)d+1x+…+a(ik+j)d+d-1xd-1 ,根据双基底乘法运算规则,则可得到:Expand Bx dk(i+1)+jd and A ik+j separately, that is, Bx dk(i+1)+jd =(b 0 β 0 +b 1 β 1 +…+b m-1 β m-1 )x dk(i+1)+jd ,A ik+j =a (ik+j)d +a (ik+j)d+1 x+…+a (ik+j)d+d-1 x d- 1 , according to the double base multiplication operation rules, we can get:
Bxdk(i+1)+jdAik+j Bx dk(i+1)+jd A ik+j
=(b0β0+b1β1+…+bm-1βm-1)xdk(i+1)+jdAik+j =(b 0 β 0 +b 1 β 1 +…+b m-1 β m-1 )x dk(i+1)+jd A ik+j
=(b0 (p)β0+b1 (p)β1+…bm-1 (p)βm-1)Aik+j =(b 0 (p) β 0 +b 1 (p) β 1 +…b m-1 (p) β m-1 )A ik+j
=(a(ik+j)d+a(ik+j)d+1x+…+a(ik+j)d+d-1xd-1)B(p) =(a (ik+j)d +a (ik+j)d+1 x+…+a (ik+j)d+d-1 x d-1 )B (p)
=aqB(p)+aq+1xB(p)+…+aq+d-1xd-1B(p) =a q B (p) +a q+1 xB (p) +…+a q+d-1 x d-1 B (p)
=aqB(p+q)+aq+1B(p+q+1)+…+aq+d-1B(p+q+d-1) =a q B (p+q) +a q+1 B( p+q+1) +…+a q+d-1 B (p+q+d-1)
=[B(p+q),B(p+q+1),...,B(p+q+d-1)][aq,aq+1,...,aq+d-1]T =[B (p+q) ,B (p+q+1) ,...,B (p+q+d-1) ][a q ,a q+1 ,...,a q+d -1 ] T
p=dk(i+1)+jdp=dk(i+1)+jd
其中,q=(ik+j)dAmong them, q=(ik+j)d
B(p)=b0 (p)β0+b1 (p)β1+…+bm-1 (p)βm-1 B (p) =b 0 (p) β 0 +b 1 (p) β 1 +…+b m-1 (p) β m-1
图3处理单元PEi,j的详细电路中,CMP模块的输入是Bxdk(i+1)+jd,输出是[B(p+q),B(p+q+1),...,B(p+q+d-1)],CVP模块的输入是Aik+j,输出的是[aq,aq+1,...,aq+d-1]T,PWM模块用于计算[B(p+q),B(p+q+1),...,B(p+q+d-1)][aq,aq+1,...,aq+d-1]T乘积结果,再与相加,结果输入到暂存器L中,再从暂存器L输出R1模块的输入是Bin,实现xdBinmodF(x)运算,结果保存到暂存器L中,再从暂存器L作为Bout输出。Figure 3 In the detailed circuit of processing unit PE i,j , the input of the CMP module is Bx dk(i+1)+jd , and the output is [B (p+q) ,B (p+q+1) ,... ,B (p+q+d-1) ], the input of the CVP module is A ik+j , the output is [a q ,a q+1 ,...,a q+d-1 ] T , the PWM module For calculating [B (p+q) ,B (p+q+1) ,...,B (p+q+d-1) ][a q ,a q+1 ,...,a q +d-1 ] T product result, and then with Add, the result is input to the temporary register L, and then output from the temporary register L The input of the R1 module is B in , and the operation of x d B in modF(x) is realized, and the result is saved in the temporary register L, and then output from the temporary register L as B out .
在计算[B(p+q),B(p+q+1),...,B(p+q+d-1)][aq,aq+1,...,aq+d-1]T,由于是Toeplitz矩阵-向量乘积,分割成
图4显示了处理单元PE的CMP,CVP和PWM具体电路。CMP模块的输入是(t0,t1,t2),经过异或门XOR_1和XOR_2,输入(t0+t1,t1,t1+t2);CVP模块输入的是(v0,v1),经过异或门XOR_3,输入(v0,v0+v1,v1);PWM模块是将CMP模块和CVP模块输出的结果进行点对点相乘,经过3个与门AND_1、AND_2和AND_3,输出(v0(t0+t1),t1(v0+v1),v1(t2+t1));FR还原模块利用2个异或门XOR_4和XOR_5,计算出c0=t1(v0+v1)+v1(t2+t1)和c1=t1(v0+v1)+v0(t0+t1),输出(c0,c1)。Figure 4 shows the specific circuits of CMP, CVP and PWM of the processing unit PE. The input of the CMP module is (t 0 , t 1 , t 2 ), through the XOR gates XOR_1 and XOR_2, the input (t 0 +t 1 ,t 1 ,t 1 +t 2 ); the input of the CVP module is (v 0 ,v 1 ), through the XOR gate XOR_3, input (v 0 ,v 0 +v 1 ,v 1 ); the PWM module multiplies the output results of the CMP module and the CVP module point-to-point, and passes through three AND gates AND_1, AND_2 and AND_3, output (v 0 (t 0 +t 1 ), t 1 (v 0 +v 1 ), v 1 (t 2 +t 1 )); the FR restoration module utilizes two XOR gates XOR_4 and XOR_5, Calculate c 0 =t 1 (v 0 +v 1 )+v 1 (t 2 +t 1 ) and c 1 =t 1 (v 0 +v 1 )+v 0 (t 0 +t 1 ), output ( c 0 ,c 1 ).
图2给出了本发明提出的多位元串联脉动乘法器架构,是将图1给出的结构进行折叠得到。图1中使用了k2个运算单元PE,而每行k个运算单元PE的结构和功能是一样的,所以可以用第1行的k个运算单元PE替代剩余的k个运算单元PE,这样需要k个周期。第1个周期A的输入是(A0,A1,…,Ak-1),B直接输入,计算结果经过FRRP还原模块输入到暂存器C中;第2个周期A的输入(Ak,Ak+1,…,A2k-1),B经过R3模块输入,计算结果也经过FRRP还原模块,与第1个周期的计算结果相加,保存在暂存器C中;如此,知道第k个周期,A的输入是B经过(k-1)次R3模块后输入,计算结果经过FRRP还原模块,与前面(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,为C=ABmodF(x)。FIG. 2 shows the structure of the multi-bit serial systolic multiplier proposed by the present invention, which is obtained by folding the structure shown in FIG. 1 . In Fig. 1, k 2 computing units PE are used, and the structures and functions of the k computing units PE in each row are the same, so the k computing units PE in the first row can be used to replace the remaining k computing units PE, thus K cycles are required. The input of A in the first cycle is (A 0 ,A 1 ,…,A k-1 ), B is directly input, and the calculation result is input into the temporary register C through the FRRP restoration module; the input of A in the second cycle (A k ,A k+1 ,…,A 2k-1 ), B is input through the R3 module, and the calculation result is also passed through the FRRP recovery module, added to the calculation result of the first cycle, and stored in the temporary register C; thus, Knowing the kth cycle, the input to A is B is input after (k-1) times of R3 module, and the calculation result is added to the previous (k-1) accumulated results through the FRRP restoration module, and saved in the temporary register C, and then the temporary register C outputs the result. is C=ABmodF(x).
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310115401.7A CN103186360B (en) | 2013-04-03 | 2013-04-03 | Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310115401.7A CN103186360B (en) | 2013-04-03 | 2013-04-03 | Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103186360A true CN103186360A (en) | 2013-07-03 |
CN103186360B CN103186360B (en) | 2016-08-03 |
Family
ID=48677539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310115401.7A Expired - Fee Related CN103186360B (en) | 2013-04-03 | 2013-04-03 | Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103186360B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252332A (en) * | 2014-08-20 | 2014-12-31 | 哈尔滨工业大学深圳研究生院 | Multiplier and multiplier processing element for ellipse cipher apparatus |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW527561B (en) * | 2001-11-02 | 2003-04-11 | Chiou-Ying Lee | Low-complexity bit-parallel systolic multiplier over GF (2m) |
TW200710716A (en) * | 2006-11-24 | 2007-03-16 | Univ Lunghwa Sci & Technology | Low-complexity finite field GF(2m) bit-parallel systolic array dual-basis multiplier |
CN102073477A (en) * | 2010-11-29 | 2011-05-25 | 北京航空航天大学 | Implementation method of finite field multiplying unit with functions of detecting, correcting and locating error |
CN102929574A (en) * | 2012-10-18 | 2013-02-13 | 复旦大学 | Design method of systolic multiplier on GF(2163) domain |
-
2013
- 2013-04-03 CN CN201310115401.7A patent/CN103186360B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW527561B (en) * | 2001-11-02 | 2003-04-11 | Chiou-Ying Lee | Low-complexity bit-parallel systolic multiplier over GF (2m) |
TW200710716A (en) * | 2006-11-24 | 2007-03-16 | Univ Lunghwa Sci & Technology | Low-complexity finite field GF(2m) bit-parallel systolic array dual-basis multiplier |
CN102073477A (en) * | 2010-11-29 | 2011-05-25 | 北京航空航天大学 | Implementation method of finite field multiplying unit with functions of detecting, correcting and locating error |
CN102929574A (en) * | 2012-10-18 | 2013-02-13 | 复旦大学 | Design method of systolic multiplier on GF(2163) domain |
Non-Patent Citations (2)
Title |
---|
CHIOU-YNG LEE: "《Low-Complexity Bit-Parallel Sysolic Montgomery Multipliers for Special Classes of GF(2/sup m)》", 《IEEE TRANSACTION ON COMPUTERS》, vol. 54, no. 9, 25 July 2005 (2005-07-25), pages 1061 - 1070 * |
HAINING FAN ET AL.: "Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases", 《IEEE TRANSACTION ON COMPUTERS》, vol. 56, no. 10, 25 October 2007 (2007-10-25), pages 1435 - 1437, XP011191962, DOI: doi:10.1109/TC.2007.1076 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252332A (en) * | 2014-08-20 | 2014-12-31 | 哈尔滨工业大学深圳研究生院 | Multiplier and multiplier processing element for ellipse cipher apparatus |
CN104252332B (en) * | 2014-08-20 | 2018-09-18 | 哈尔滨工业大学深圳研究生院 | A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device |
Also Published As
Publication number | Publication date |
---|---|
CN103186360B (en) | 2016-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee | Low complexity bit-parallel systolic multiplier over GF (2 m) using irreducible trinomials | |
Kim et al. | FPGA implementation of high performance elliptic curve cryptographic processor over GF (2163) | |
CN104184578B (en) | A kind of Elliptic Curve Scalar Multiplication method accelerating circuit and its algorithm based on FPGA | |
Xie et al. | High-throughput finite field multipliers using redundant basis for FPGA and ASIC implementations | |
Reyhani-Masoleh | A new bit-serial architecture for field multiplication using polynomial bases | |
CN103186360B (en) | Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis | |
Kim et al. | Computation of AB2 multiplication in GF (2m) using low-complexity systolic architecture | |
CN103942027A (en) | Reconfigurable rapid parallel multiplier | |
Lee | Super Digit-Serial Systolic Multiplier over GF (2^ m) | |
Meher | Systolic formulation for low-complexity serial-parallel implementation of unified finite field multiplication over GF (2 m) | |
Talapatra et al. | Unified digit serial systolic Montgomery multiplication architecture for special classes of polynomials over GF (2m) | |
Saravanan et al. | Performance analysis of reversible finite field arithmetic architectures over GF (p) and GF (2m) in elliptic curve cryptography | |
Mozhi et al. | Efficient bit-parallel systolic multiplier over GF (2 m) | |
Meher | High-throughput hardware-efficient digit-serial architecture for field multiplication over GF (2 m) | |
Jeon et al. | Low-power exponent architecture in finite fields | |
Pradhan et al. | Digit-Size Selection for FPGA Implementation of Generic Digit-Serial Multiplication Over GF (2m) | |
Hariri et al. | Digit-level semi-systolic and systolic structures for the shifted polynomial basis multiplication over binary extension fields | |
Pillutla et al. | High-throughput area-delay-efficient systolic multiplier over GF (2m) for a class of trinomials | |
Rashmi et al. | Optimized reversible montgomery multiplier | |
Tujillo-Olaya et al. | Hardware architectures for elliptic curve cryptoprocessors using polynomial and Gaussian normal basis over GF (2 233) | |
Trujillo-Olaya et al. | Half-matrix normal basis multiplier over GF ($ p^{m} $) | |
Ku et al. | ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM | |
Dake et al. | Implementation of high-throughput digit-serial redundant basis multiplier over finite field | |
Rajalakshmi et al. | Low-complexity systolic design for finite field multiplier | |
Fournaris et al. | Low area elliptic curve arithmetic unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160803 Termination date: 20180403 |
|
CF01 | Termination of patent right due to non-payment of annual fee |