CN103186360A - Fast arithmetic multi-bit serial pulse dual-base binary finite field multiplier - Google Patents

Fast arithmetic multi-bit serial pulse dual-base binary finite field multiplier Download PDF

Info

Publication number
CN103186360A
CN103186360A CN2013101154017A CN201310115401A CN103186360A CN 103186360 A CN103186360 A CN 103186360A CN 2013101154017 A CN2013101154017 A CN 2013101154017A CN 201310115401 A CN201310115401 A CN 201310115401A CN 103186360 A CN103186360 A CN 103186360A
Authority
CN
China
Prior art keywords
module
input
result
xor
frrp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101154017A
Other languages
Chinese (zh)
Other versions
CN103186360B (en
Inventor
潘正祥
杨春生
白忠海
李秋莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Shenzhen
Original Assignee
Harbin Institute of Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Shenzhen filed Critical Harbin Institute of Technology Shenzhen
Priority to CN201310115401.7A priority Critical patent/CN103186360B/en
Publication of CN103186360A publication Critical patent/CN103186360A/en
Application granted granted Critical
Publication of CN103186360B publication Critical patent/CN103186360B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

本发明涉及一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是,B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入,B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是,B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与前面(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果。

The invention relates to a multi-bit serial pulsating double-base binary finite-field multiplier with fast operation, comprising an input terminal B, k PE modules, an FRRP module, and an R3 module, wherein the k PE modules are connected in series, and the k PE modules After k cycles, the input of the first cycle A is , B is directly input, and the calculation result is restored and input to the temporary register C through the FRRP module; the input of A in the second cycle , B is input through the R3 module, and the calculation result is also restored by the FRRP module, added to the calculation result of the first cycle, and stored in the temporary register C; thus, in the kth cycle, the input of A is , B is input after ( k -1) times of the R3 module, the calculation result is restored by the FRRP module, added to the previous ( k -1) accumulation results, saved in the temporary register C, and then stored by the temporary C outputs the result.

Description

快速运算多位元串联脉动双基底二进制有限域乘法器Fast operation multi-bit series systolic double base binary finite field multiplier

技术领域technical field

本发明涉及一种二进制有限域乘法器,尤其涉及一种快速运算多位元串联脉动双基底二进制有限域乘法器。The invention relates to a binary finite field multiplier, in particular to a fast operation multi-bit series pulse double base binary finite field multiplier.

背景技术Background technique

近年来,椭圆曲线密码学(ECC,Elliptic curve cryptography)[1],[2]已经被与密码学的研究联系起来。随着椭圆曲线密码学在公钥密码系统中的出现,一些硬件实现的问题在ECC的应用上被提了出来。NIST推荐了5个二位元场,GF(2163),GF(2233),GF(2283),GF(2409),and GF(2571)。在基于ECC基底的密码协议中,有现场乘法是计算点成的必不可少的元素。密码系统硬件的有效性通常影响面积,能量消耗,以及性能表现。In recent years, Elliptic Curve Cryptography (ECC, Elliptic curve cryptography) [1], [2] has been associated with the study of cryptography. With the emergence of elliptic curve cryptography in public key cryptosystems, some hardware implementation issues have been raised in the application of ECC. NIST recommends five binary fields, GF(2 163 ), GF(2 233 ), GF(2 283 ), GF(2 409 ), and GF(2 571 ). In cryptographic protocols based on ECC substrates, on-site multiplication is an essential element for computing point scores. The availability of cryptosystem hardware generally affects area, power consumption, and performance.

对于高速大规模集成电路(VLSI,very-large-scale integration)的实现,心脏收缩阵列结构是更佳的选择。在扩展的二位元场中,多种有效的心脏收缩阵列乘法器已经被设计并且可以被归类为位并行和为串联机构。有效的位并行心脏收缩乘法器通常采用LSB优先或是MSB优先算法。位并行心脏收缩乘法器的主要优点是整个计算过程中的贯通性。然而,这些结构对基于二位元场的多项式需要O(m2)XOR,O(m2)AND,O(m2)一位的锁存器和O(m)的延迟复杂度。为了减少时间和空间复杂度,LEE[8],[9],[13]算法展示了有现场乘法对于一些特殊多项式,例如全一多项式,五项多项式,三项多项式,都可以用Toeplitz矩阵向量乘法(TMVP,Toeplitz matrix-vector product)去建立满为并行心脏收缩乘法器。位串联心脏收缩阵列乘法器需要O(m)的空间复杂度,但他们导致了更长的计算延迟。For the realization of high-speed large-scale integration (VLSI, very-large-scale integration), the systolic array structure is a better choice. In the extended binary field, a variety of efficient systolic array multipliers have been designed and can be classified as bit-parallel and as serial mechanisms. Efficient bit-parallel systolic multipliers typically use LSB-first or MSB-first algorithms. The main advantage of the bit-parallel systolic multiplier is the continuity throughout the computation. However, these structures require O(m 2 ) XOR, O(m 2 ) AND, O(m 2 ) one-bit latches and O(m) delay complexity for polynomials based on two-bit fields. In order to reduce the time and space complexity, the LEE[8],[9],[13] algorithms show that there are field multiplications. For some special polynomials, such as all-one polynomials, five-term polynomials, and three-term polynomials, Toeplitz matrix vectors can be used Multiplication (TMVP, Toeplitz matrix-vector product) to build fully parallel systolic multipliers. Bit-concatenated systolic array multipliers require O(m) space complexity, but they incur longer computation delays.

为了时间复杂度和空间复杂度的一个折中,在为并列和为串联乘法器之间,数字串列心脏收缩乘法器已经被公开。数字串列转换多项式基底乘法器,基于内部是数字,外部是并行的结构被在[20]中被提出。在这样的乘法器里,元素域长中m位可以再分成

Figure GDA00003008658000011
个d位长的子段。在每个时钟周期里,d位的字串计算出来并且一个m位的乘法计算出来了。一个可扩展的和心脏收缩的乘法器使用一个固有的d*d位的平行的汉克向量矩阵已经在[15],[16]提出来它的延迟是
Figure GDA00003008658000021
个时钟周期。多位元串联脉动乘法器内部和外部使用不同的结构在文献中呈现。这些乘法器的延迟是时钟周期。如前面所提到的,低复杂度的心脏收缩有限域乘法器的设计依靠于不可约多项式的选择和表现基底的选择,这些数字串联乘法器需要高延时去实现乘法计算。For a trade-off between time complexity and space complexity, between parallel and serial multipliers, digital serial systolic multipliers have been disclosed. A serial-to-digital polynomial base multiplier, based on a digital-inner, parallel-outer structure, was proposed in [20]. In such a multiplier, the m bits of the element field length can be subdivided into
Figure GDA00003008658000011
A subsection of d bits long. In each clock cycle, a d-bit string is computed and an m-bit multiplication is computed. A scalable and systolic multiplier using an inherent d*d-bit parallel Hank vector matrix has been proposed in [15], [16] whose delay is
Figure GDA00003008658000021
clock cycle. Multi-bit cascaded systolic multipliers using different structures internally and externally are presented in the literature. The delay of these multipliers is clock cycle. As mentioned earlier, the design of low-complexity systolic finite-field multipliers relies on the choice of irreducible polynomials and the choice of representational bases, and these digital cascade multipliers require high latency for multiplication computations.

发明内容Contents of the invention

本发明解决的技术问题是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,克服现有乘法器需要高延时去实现乘法计算的技术问题。The technical problem solved by the invention is to construct a multi-bit serial pulsation double-base binary finite-field multiplier with fast operation, and overcome the technical problem that the existing multiplier requires high delay to realize multiplication calculation.

本发明的技术方案是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是A0、A1、...、Ak-1,B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入Ak、Ak+1、…、A2k-1,B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与所述(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,所述R3模块实现BxkdmodF(x)的计算,所述PE模块包括R1模块、CMP模块、CVP模块、PWM模块、

Figure GDA00003008658000023
个异或门、和个锁存器,所述R3模块输出到所述R1模块后经所述CMP模块进行系数转换,A的分段输入所述CVP模块进行A的分段的系数转换,CMP模块和CVP模块的计算结果均输入到PWM模块,实现Bin和A分段乘积计算,经过
Figure GDA00003008658000025
个异或门累加,结果保存在个锁存器中,由
Figure GDA00003008658000027
锁存器输出结果
Figure GDA00003008658000028
其中,A通过三项多项式F(x)=1+xn+xm,表示为A=a0+a1x+...+am-1xm-1,共有m个系数,即(a0,a1,...,am-1)。使用分段切割法,将m位的A切割成
Figure GDA000030086580000211
每段d位,总共有k2个分段,因此有
Figure GDA00003008658000029
B通过双基底可表示为B=b0β0+b1β1+...+bm-1βm-1,作为乘法器的另一个输入;C为输出结果。The technical scheme of the present invention is: build a kind of rapid operation multi-bit serial pulsation double-base binary finite-field multiplier, including input terminal B, k PE modules, FRRP module, R3 module, described k PE modules are connected in series, so The above k PE modules go through k cycles, the input of A in the first cycle is A 0 , A 1 , ..., A k-1 , B is directly input, and the calculation result is restored and input to the temporary register by the FRRP module In C; the input A k , A k+1 , ..., A 2k-1 of A in the second cycle, B is input through the R3 module, and the calculation result is also restored by the FRRP module, which is the same as the calculation result of the first cycle Add and store in register C; thus, in the kth cycle, the input of A is B is input after (k-1) times of the R3 module, the calculation result is restored by the FRRP module, added to the (k-1) accumulation result, saved in the temporary register C, and then stored by the temporary Device C output result, described R3 module realizes the calculation of Bx kd modF (x), and described PE module comprises R1 module, CMP module, CVP module, PWM module,
Figure GDA00003008658000023
XOR gates, and A latch, after the R3 module is output to the R1 module, the coefficient conversion is performed by the CMP module, and the subsection of A is input to the CVP module to perform the subsection coefficient conversion of A, and the calculation of the CMP module and the CVP module The results are all input to the PWM module to realize the calculation of the segmental product of B in and A, after
Figure GDA00003008658000025
XOR gates are accumulated, and the result is stored in of latches, by
Figure GDA00003008658000027
Latch output result
Figure GDA00003008658000028
Among them, A is expressed as A=a 0 +a 1 x+...+a m-1 x m-1 through a three-term polynomial F(x)=1+x n +x m , and there are m coefficients in total, namely ( a 0 ,a 1 ,...,a m-1 ). Using the segmented cutting method, the m-bit A is cut into
Figure GDA000030086580000211
d bits per segment, there are k 2 segments in total, so there are
Figure GDA00003008658000029
B can be expressed as B=b 0 β 0 +b 1 β 1 +...+b m-1 β m-1 through the double basis, as another input of the multiplier; C is the output result.

本发明的进一步技术方案是:所述FRRP模块包括FR模块、R2模块,所述R2模块实现Cmod(xm+1)的计算,所述FR模块的输入是k个串联PE模块的计算结果,对结果进行还原,输出到R2模块。The further technical scheme of the present invention is: described FRRP module comprises FR module, R2 module, and described R2 module realizes the calculation of Cmod (x m +1), and the input of described FR module is the calculation result of k serial PE modules, The result is restored and output to the R2 module.

本发明的进一步技术方案是:所述CMP模块包括异或门XOR_1和XOR_2,所述异或门XOR_1和XOR_2并联。A further technical solution of the present invention is: the CMP module includes exclusive OR gates XOR_1 and XOR_2, and the exclusive OR gates XOR_1 and XOR_2 are connected in parallel.

本发明的进一步技术方案是:所述CVP模块为异或门XOR_3。A further technical solution of the present invention is: the CVP module is an exclusive OR gate XOR_3.

本发明的进一步技术方案是:所述PWM模块包括三个并联的与门AND_1、AND_2和AND_3。将所述CMP模块和所述CVP模块输出的结果进行点对点相乘。A further technical solution of the present invention is: the PWM module includes three parallel AND gates AND_1, AND_2 and AND_3. Perform point-to-point multiplication on the results output by the CMP module and the CVP module.

本发明的进一步技术方案是:所述FR模块包括两个并联的异或门XOR_4和XOR_5。A further technical solution of the present invention is: the FR module includes two parallel-connected XOR gates XOR_4 and XOR_5.

本发明的技术效果是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是(A0,A1,…Ak-1),B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入(Ak,Ak+1,…,A2k-1),B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是

Figure GDA00003008658000032
B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与前面(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,本发明结合多项式基底和MPB去建立双基底乘法。一些有现场乘法能够得到在位并行结构里通过次子空间TMVP得到。在二位元场GF(2m),不可分解三项多项式和五项多项式被广泛的应用在密码领域,在这样的领域中位长通常比较大。本发明中通过一种新的数字串联新站收缩双基底乘法器通过使用次二次TMVP公式,一旦一个d*d的Toeplitz乘法被选择了,被提出的结构能去的非常低的
Figure GDA00003008658000031
时钟周期。The technical effect of the present invention is: construct a kind of multi-bit serial pulsating double-base binary finite-field multiplier of fast operation, comprise input terminal B, k PE modules, FRRP module, R3 module, described k PE modules are connected in series, so The above k PE modules go through k cycles, the input of A in the first cycle is (A 0 , A 1 ,...A k-1 ), B is directly input, and the calculation result is restored and input to the temporary register C by the FRRP module Middle; the input of A in the second period (A k ,A k+1 ,...,A 2k-1 ), B is input through the R3 module, and the calculation result is also restored by the FRRP module, which is the same as the calculation result of the first period Added and saved in the temporary register C; thus, in the kth cycle, the input of A is
Figure GDA00003008658000032
B is input after (k-1) times of the R3 module, the calculation result is restored by the FRRP module, added to the previous (k-1) accumulation results, saved in the temporary register C, and then transferred to the temporary register C output results, the present invention combines polynomial basis and MPB to establish double basis multiplication. Some in-field multiplications can be obtained by subsubspace TMVP in bit-parallel architectures. In the two-bit field GF(2 m ), indecomposable trinomial polynomials and pentanomial polynomials are widely used in the field of cryptography, and the bit length is usually relatively large in such fields. In the present invention, by using a new digital cascaded new-station contraction double-base multiplier by using the sub-quadratic TMVP formula, once a d*d Toeplitz multiplication is selected, the proposed structure can go to very low
Figure GDA00003008658000031
clock cycle.

附图说明Description of drawings

图1为本发明的结构示意图。Fig. 1 is a structural schematic diagram of the present invention.

图2为本发明多位元串联脉动乘法器结构图。FIG. 2 is a structural diagram of a multi-bit serial systolic multiplier of the present invention.

图3为本发明处理单元PE的结构图。Fig. 3 is a structural diagram of the processing unit PE of the present invention.

图4为本发明PE模块的具体电路图。Fig. 4 is a specific circuit diagram of the PE module of the present invention.

具体实施方式Detailed ways

下面结合具体实施例,对本发明技术方案进一步说明。The technical solutions of the present invention will be further described below in conjunction with specific embodiments.

如图2所示,本发明的具体实施方式是:构建一种快速运算多位元串联脉动双基底二进制有限域乘法器,包括输入端B、k个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是A0、A1、…、Ak-1,B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入Ak、Ak+1、…、A2k-1,B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是

Figure GDA00003008658000048
B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与所述(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,所述R3模块实现BxkdmodF(x)的计算,所述PE模块包括R1模块、CMP模块、CVP模块、PWM模块、
Figure GDA00003008658000041
个异或门、和个锁存器,所述R3模块输出到所述R1模块后经所述CMP模块进行系数转换,A的分段输入所述CVP模块进行A的分段的系数转换,CMP模块和CVP模块的计算结果均输入到PWM模块,实现Bin和A分段乘积计算,经过
Figure GDA00003008658000043
个异或门累加,结果保存在
Figure GDA00003008658000044
个锁存器中,由
Figure GDA00003008658000045
锁存器输出结果
Figure GDA00003008658000046
其中,A通过三项多项式F(x)=1+xn+xm,表示为A=a0+a1x+...+am-1xm-1,共有m个系数,即(a0,a1,...,am-1)。使用分段切割法,将m位的A切割成
Figure GDA00003008658000049
每段d位,总共有k2个分段,因此有
Figure GDA00003008658000047
B通过双基底可表示为B=b0β0+b1β1+...+bm-1βm-1,作为乘法器的另一个输入;C为输出结果。As shown in Fig. 2, the specific embodiment of the present invention is: build a kind of multi-bit serial pulsation double-base binary finite-field multiplier of fast operation, comprise input terminal B, k PE modules, FRRP module, R3 module, described K PE modules are connected in series, the k PE modules go through k cycles, the input of A in the first cycle is A 0 , A 1 ,..., A k-1 , B is directly input, and the calculation result is restored by the FRRP module Input to the temporary register C; the input A k , A k+1 , ..., A 2k-1 of the second cycle A, B is input through the R3 module, and the calculation result is also restored by the FRRP module, which is the same as the first The calculation results of the cycle are added and stored in the temporary register C; thus, in the kth cycle, the input of A is
Figure GDA00003008658000048
B is input after (k-1) times of the R3 module, the calculation result is restored by the FRRP module, added to the (k-1) accumulation result, saved in the temporary register C, and then stored by the temporary Device C output result, described R3 module realizes the calculation of Bx kd modF (x), and described PE module comprises R1 module, CMP module, CVP module, PWM module,
Figure GDA00003008658000041
XOR gates, and A latch, after the R3 module is output to the R1 module, the coefficient conversion is performed by the CMP module, and the subsection of A is input to the CVP module to perform the subsection coefficient conversion of A, and the calculation of the CMP module and the CVP module The results are all input to the PWM module to realize the calculation of the segmental product of B in and A, after
Figure GDA00003008658000043
XOR gates are accumulated, and the result is stored in
Figure GDA00003008658000044
of latches, by
Figure GDA00003008658000045
Latch output result
Figure GDA00003008658000046
Among them, A is expressed as A=a 0 +a 1 x+...+a m-1 x m-1 through a three-term polynomial F(x)=1+x n +x m , and there are m coefficients in total, namely ( a 0 ,a 1 ,...,a m-1 ). Using the segmented cutting method, the m-bit A is cut into
Figure GDA00003008658000049
d bits per segment, there are k 2 segments in total, so there are
Figure GDA00003008658000047
B can be expressed as B=b 0 β 0 +b 1 β 1 +...+b m-1 β m-1 through the double basis, as another input of the multiplier; C is the output result.

本发明的优选实施方式是:所述FRRP模块包括FR模块、R2模块,所述R2模块实现Cmod(xm+1)的计算,所述FR模块的输入是k个串联PE模块的计算结果,对结果进行还原,输出到R2模块。A preferred embodiment of the present invention is: the FRRP module includes an FR module and an R2 module, the R2 module realizes the calculation of Cmod(x m +1), and the input of the FR module is the calculation result of k series PE modules, The result is restored and output to the R2 module.

CMP模块和CVP模块的输入分别是Bin

Figure GDA00003008658000051
其输出结果都作为PWM模块的输入,PWM模块的输出经过个异或门,和
Figure GDA00003008658000053
个锁存器,输出结果
Figure GDA00003008658000054
R1模块的输入是Bin,其输出经过m个锁存器,输出结果Bout。CMP模块的输入是Bxdk(i+1)+jd,输出是[B(p+q),B(p+q+1),...,B(p+q+d-1)],CVP模块的输入是Aik+j,输出的是[aq,aq+1,...,aq+d-1]T,其中
Figure GDA00003008658000055
表示
Figure GDA000030086580000510
排列成矩阵的行数和列数,i,j=0,1,...,k-1,i表示矩阵的第i行,j表示矩阵的第j列,p表示dk(i+1)+jd,q表示(ik+j)d,T表示[aq,aq+1,...,aq+d-1]矩阵的转置。其输出结果与上一个FRRP模块的结果进行累加,并输出到下一个FRRP模块。The inputs of the CMP module and the CVP module are B in and
Figure GDA00003008658000051
The output results are all used as the input of the PWM module, and the output of the PWM module is passed through XOR gates, and
Figure GDA00003008658000053
latch, the output result
Figure GDA00003008658000054
The input of the R1 module is B in , its output passes through m latches, and the output result is B out . The input of the CMP module is Bx dk(i+1)+jd , and the output is [B (p+q) ,B( p+q+1) ,...,B (p+q+d-1) ], The input of the CVP module is A ik+j , and the output is [a q ,a q+1 ,...,a q+d-1 ] T , where
Figure GDA00003008658000055
express
Figure GDA000030086580000510
The number of rows and columns arranged into a matrix, i,j=0,1,...,k-1, i represents the i-th row of the matrix, j represents the j-th column of the matrix, and p represents dk(i+1) +jd, q means (ik+j)d, T means the transpose of [a q ,a q+1 ,...,a q+d-1 ] matrix. The output result is accumulated with the result of the previous FRRP module and output to the next FRRP module.

图1脉动阵列双基底乘法器结构中展示了整个双基底乘法的结构,A,B,C是三个在GF(2m)中的元素,由不可分解三项多项式F(x)=1+xn+xm组成,其中,n≤m/2。元素A由多项式基底表示法表示,B和C用双基底表示法表示,整个乘法器实现C=ABmodF(x)功能,其中A、B作为输入,C为输出结果。A通过三项多项式F(x)=1+xn+xm,表示为A=a0+a1x+...+am-1xm-1,共有m个系数,即(a0,a1,...,am-1)。使用分段切割法,将m位的A切割成每段d位,总共有k2个分段,因此有每个分段Ai可表示为Ai=aid+aid+1x+…+aid+d-1xd-1,所有分段

Figure GDA00003008658000058
代替A作为整个乘法器的输入。B通过双基底可表示为B=b0β0+b1β1+...+bm-1βm-1,作为乘法器的另一个输入。C为输出结果,由C=ABmodF(x)计算得到,即整个乘法器实现的功能。The structure of the entire double-base multiplication is shown in the structure of the systolic array double-base multiplier in Fig. 1. A, B, and C are three elements in GF(2 m ), and the non-decomposable trinomial polynomial F(x)=1+ x n + x m , where n≤m/2. Element A is represented by polynomial basis notation, B and C are represented by double basis notation, and the whole multiplier realizes the function of C=ABmodF(x), where A and B are used as input, and C is the output result. A passes the three-term polynomial F(x)=1+x n +x m , expressed as A=a 0 +a 1 x+...+a m-1 x m-1 , there are m coefficients in total, namely (a 0 ,a 1 ,...,a m-1 ). Using the segmented cutting method, the m-bit A is cut into d bits per segment, there are k 2 segments in total, so there are Each segment Ai can be expressed as A i =a id +a id+1 x+…+a id+d-1 x d-1 , all segments
Figure GDA00003008658000058
instead of A as the input to the entire multiplier. B can be expressed as B=b 0 β 0 +b 1 β 1 +...+b m-1 β m-1 through the double basis, as another input of the multiplier. C is the output result, which is calculated by C=ABmodF(x), that is, the function realized by the entire multiplier.

由于A被分割成所以A可表示为 A = A 0 + A 1 x d + . . . + A k 2 - 1 x ( k 2 - 1 ) d . 因此将C=ABmodF(x)中A展开可以得到:Since A is split into So A can be expressed as A = A 0 + A 1 x d + . . . + A k 2 - 1 x ( k 2 - 1 ) d . Therefore, expanding A in C=ABmodF(x) can be obtained:

其中 C = AB mod F ( x ) = B ( A 0 + A 1 x d + · · · + A k 2 - 1 x ( k 2 - 1 ) d ) mod F ( x ) = ( B ( A 0 + A 1 x d + · · · + A k - 1 x ( k - 1 ) d ) + Bx dk ( A k + A k + 1 x d + · · · + A 2 k - 1 x ( k - 1 ) d ) + · · · + Bx dk ( k - 1 ) ( A k ( k - 1 ) + A k ( k - 1 ) + 1 x d + · · · + A k 2 - 1 x ( k - 1 ) d ) ) mod F ( x ) = ( C 0 + C 1 + · · · + C k - 1 ) mod F ( x ) C 0 = B ( A 0 + A 1 x d + · · · + A k - 1 x ( k - 1 ) d ) C 1 = Bx dk ( A k + A k + 1 x d + · · · + A 2 k - 1 x ( k - 1 ) d ) · · · C k - 1 = Bx dk ( k - 1 ) ( A k ( k - 1 ) + A k ( k - 1 + 1 ) x d + · · · + A k 2 - 1 x ( k - 1 ) d ) in C = AB mod f ( x ) = B ( A 0 + A 1 x d + &Center Dot; &Center Dot; &Center Dot; + A k 2 - 1 x ( k 2 - 1 ) d ) mod f ( x ) = ( B ( A 0 + A 1 x d + &Center Dot; &Center Dot; &Center Dot; + A k - 1 x ( k - 1 ) d ) + Bx dk ( A k + A k + 1 x d + &Center Dot; &Center Dot; · + A 2 k - 1 x ( k - 1 ) d ) + &Center Dot; &Center Dot; &Center Dot; + Bx dk ( k - 1 ) ( A k ( k - 1 ) + A k ( k - 1 ) + 1 x d + &Center Dot; &Center Dot; &Center Dot; + A k 2 - 1 x ( k - 1 ) d ) ) mod f ( x ) = ( C 0 + C 1 + · · &Center Dot; + C k - 1 ) mod f ( x ) C 0 = B ( A 0 + A 1 x d + · · · + A k - 1 x ( k - 1 ) d ) C 1 = Bx dk ( A k + A k + 1 x d + · · · + A 2 k - 1 x ( k - 1 ) d ) &Center Dot; &Center Dot; &Center Dot; C k - 1 = Bx dk ( k - 1 ) ( A k ( k - 1 ) + A k ( k - 1 + 1 ) x d + &Center Dot; &Center Dot; &Center Dot; + A k 2 - 1 x ( k - 1 ) d )

图1整个乘法器结构中,第1行计算的是C0=B(A0+A1xd+…+Ak-1x(k-1)d),其第1个处理单元PE0,0计算BA0乘积结果,第2个处理单元PE0,1计算BA1xd乘积结果,以此类推,第k个处理单元PE0,k-1计算BAk-1x(k-1)d乘积结果。整个k个处理单元计算结果再累加最终得到C0,输入到第1个FRRP(FinalReconstruction-Reduction-Polynomial)模块。同样地,整个乘法器结构的第2行计算的是C1=Bxdk(Ak+Ak+1x d+…+A2k-1x(k-1)d),增加的R3模块式计算BxdkmodF(x),其输入是B。其第1个处理单元PE1,0计算BxdxA0乘积结果,后续与第1行类似,计算所得结果C1,输入到第2个FRRP模块,与第1个FRRP模块累加得到(C0+C1)modF(x)。整个乘法器的每行都进行类似计算,一直计算到第k行,其R3模块的输出结果为Bxdk(k-1)modF(x),第k个FRRP模块输入为Ck-1,输出为(C0+C1+…+Ck-1)modF(x),即为整个乘法器运算结果C=(C0+C1+…+Ck-1)modF(x)。In the entire multiplier structure in Figure 1, the first row calculates C 0 =B(A 0 +A 1 x d +…+A k-1 x (k-1)d ), and its first processing unit PE 0 ,0 calculates the product result of BA 0 , the second processing unit PE 0,1 calculates the product result of BA 1 x d , and so on, the kth processing unit PE 0,k-1 calculates BA k-1 x (k-1 )d product result. The calculation results of the entire k processing units are accumulated and finally C 0 is obtained, which is input to the first FRRP (FinalReconstruction-Reduction-Polynomial) module. Similarly, the second row of the entire multiplier structure calculates C 1 =Bx dk (A k +A k+1x d +…+A 2k-1 x (k-1)d ), the increased R3 modular calculation Bx dk modF(x), whose input is B. Its first processing unit PE 1,0 calculates the product result of Bx dx A 0 , and the follow-up is similar to the first row. The calculated result C 1 is input to the second FRRP module and accumulated with the first FRRP module to obtain (C 0 +C 1 ) mod F(x). Each row of the entire multiplier performs similar calculations until the kth row, the output of the R3 module is Bx dk(k-1) modF(x), the input of the kth FRRP module is C k-1 , and the output It is (C 0 +C 1 +...+C k-1 )modF(x), that is, the operation result of the entire multiplier C=(C 0 +C 1 +...+C k-1 )modF(x).

每个处理单元PEi,j的详细电路如图2所示,用于计算Bxdk(i+1)+jdAik+j乘积结果。Ain、Bin

Figure GDA00003008658000062
作为输入,Bout作为输出。对每行的第1个处理单元PEi,0,其Ain输入的是Aik,Bin是由第i+1个R3模块的输出,即为Bxdk(i+1)modF(x),而
Figure GDA00003008658000064
初始化为0。Bout作为R1的输出,也是第2个处理单元PEi,1的输入,输出的结果为Bxdk(i+1)+dmodF(x)。
Figure GDA00003008658000071
输出的是
Figure GDA00003008658000072
的结果,即计算Bxdk(i+1)Aik乘积结果。每行的第2个处理单元PEi,1,其Ain输入的是Aik+1,Bin输入的是Bxdk(i+1)+dmodF(x),
Figure GDA00003008658000073
输入的是第1个处理单元PEi,0计算结果,即为Bxdk(i+1)Aik,作为第3个处理单元PEi,1的输入
Figure GDA00003008658000074
Bout输出的是Bxdk(i+1)+2dmodF(x)计算结果,作为第3个处理单元PEi,1的输入Bin
Figure GDA00003008658000075
输出的是Bxdk(i+1)+dAik+1乘积结果。以此类推,每行第j+1个处理单元PEi,j计算的是Bxdk(i+1)+jdAik+j乘积结果,其Ain输入的是Aik+j,Bin输入的是Bxdk(i+1)+jdmodF(x),
Figure GDA00003008658000076
输入的是第j个模块的
Figure GDA00003008658000077
输出结果,为Bxdk(i+1)+(j-1)dAik+(j-1),Bout输出的是Bxdk(i+1)+(j+1)dmodF(x)计算结果,
Figure GDA00003008658000078
输出的是Bxdk(i+1)+jdAik+j乘积结果。The detailed circuit of each processing unit PEi,j is shown in Fig. 2, which is used to calculate the product result of Bx dk(i+1)+jd A ik+j . A in , B in and
Figure GDA00003008658000062
As input, B out and as output. For the first processing unit PE i,0 in each row, its A in input is A ik , and B in is the output of the i+1th R3 module, which is Bx dk(i+1) modF(x) ,and
Figure GDA00003008658000064
Initialized to 0. B out is the output of R1 and the input of the second processing unit PE i,1 , and the output result is Bx dk(i+1)+d modF(x).
Figure GDA00003008658000071
The output is
Figure GDA00003008658000072
The result of calculating Bx dk(i+1) A ik product result. The second processing unit PE i,1 in each row, its A in input is A ik+1 , and its B in input is Bx dk(i+1)+d modF(x),
Figure GDA00003008658000073
The input is the calculation result of the first processing unit PE i,0 , which is Bx dk(i+1) A ik , which is used as the input of the third processing unit PE i,1
Figure GDA00003008658000074
The output of B out is the calculation result of Bx dk(i+1)+2d modF(x), which is used as the input B in of the third processing unit PE i,1 ,
Figure GDA00003008658000075
The output is the product result of Bx dk(i+1)+d A ik+1 . By analogy, the j+1th processing unit PE i, j in each row calculates the product result of Bx dk(i+1)+jd A ik+j , the A in input is A ik+j , and the B in input is Bx dk(i+1)+jd modF(x),
Figure GDA00003008658000076
The input is the jth module's
Figure GDA00003008658000077
The output result is Bx dk(i+1)+(j-1)d A ik+(j-1) , B out output is Bx dk(i+1)+(j+1)d modF(x) calculation result,
Figure GDA00003008658000078
The output is the product result of Bx dk(i+1)+jd A ik+j .

将Bxdk(i+1)+jd和Aik+j分别展开,即Bxdk(i+1)+jd=(b0β0+b1β1+…+bm-1βm-1)xdk(i+1)+jd,Aik+j=a(ik+j)d+a(ik+j)d+1x+…+a(ik+j)d+d-1xd-1 ,根据双基底乘法运算规则,则可得到:Expand Bx dk(i+1)+jd and A ik+j separately, that is, Bx dk(i+1)+jd =(b 0 β 0 +b 1 β 1 +…+b m-1 β m-1 )x dk(i+1)+jd ,A ik+j =a (ik+j)d +a (ik+j)d+1 x+…+a (ik+j)d+d-1 x d- 1 , according to the double base multiplication operation rules, we can get:

Bxdk(i+1)+jdAik+j Bx dk(i+1)+jd A ik+j

=(b0β0+b1β1+…+bm-1βm-1)xdk(i+1)+jdAik+j =(b 0 β 0 +b 1 β 1 +…+b m-1 β m-1 )x dk(i+1)+jd A ik+j

=(b0 (p)β0+b1 (p)β1+…bm-1 (p)βm-1)Aik+j =(b 0 (p) β 0 +b 1 (p) β 1 +…b m-1 (p) β m-1 )A ik+j

=(a(ik+j)d+a(ik+j)d+1x+…+a(ik+j)d+d-1xd-1)B(p) =(a (ik+j)d +a (ik+j)d+1 x+…+a (ik+j)d+d-1 x d-1 )B (p)

=aqB(p)+aq+1xB(p)+…+aq+d-1xd-1B(p) =a q B (p) +a q+1 xB (p) +…+a q+d-1 x d-1 B (p)

=aqB(p+q)+aq+1B(p+q+1)+…+aq+d-1B(p+q+d-1) =a q B (p+q) +a q+1 B( p+q+1) +…+a q+d-1 B (p+q+d-1)

=[B(p+q),B(p+q+1),...,B(p+q+d-1)][aq,aq+1,...,aq+d-1]T =[B (p+q) ,B (p+q+1) ,...,B (p+q+d-1) ][a q ,a q+1 ,...,a q+d -1 ] T

p=dk(i+1)+jdp=dk(i+1)+jd

其中,q=(ik+j)dAmong them, q=(ik+j)d

B(p)=b0 (p)β0+b1 (p)β1+…+bm-1 (p)βm-1 B (p) =b 0 (p) β 0 +b 1 (p) β 1 +…+b m-1 (p) β m-1

图3处理单元PEi,j的详细电路中,CMP模块的输入是Bxdk(i+1)+jd,输出是[B(p+q),B(p+q+1),...,B(p+q+d-1)],CVP模块的输入是Aik+j,输出的是[aq,aq+1,...,aq+d-1]T,PWM模块用于计算[B(p+q),B(p+q+1),...,B(p+q+d-1)][aq,aq+1,...,aq+d-1]T乘积结果,再与

Figure GDA00003008658000079
相加,结果输入到暂存器L中,再从暂存器L输出R1模块的输入是Bin,实现xdBinmodF(x)运算,结果保存到暂存器L中,再从暂存器L作为Bout输出。Figure 3 In the detailed circuit of processing unit PE i,j , the input of the CMP module is Bx dk(i+1)+jd , and the output is [B (p+q) ,B (p+q+1) ,... ,B (p+q+d-1) ], the input of the CVP module is A ik+j , the output is [a q ,a q+1 ,...,a q+d-1 ] T , the PWM module For calculating [B (p+q) ,B (p+q+1) ,...,B (p+q+d-1) ][a q ,a q+1 ,...,a q +d-1 ] T product result, and then with
Figure GDA00003008658000079
Add, the result is input to the temporary register L, and then output from the temporary register L The input of the R1 module is B in , and the operation of x d B in modF(x) is realized, and the result is saved in the temporary register L, and then output from the temporary register L as B out .

在计算[B(p+q),B(p+q+1),...,B(p+q+d-1)][aq,aq+1,...,aq+d-1]T,由于是Toeplitz矩阵-向量乘积,分割成 t 1 t 2 t 0 t 1 v 0 v 1 , ( t 1 t 2 t 0 t 1 表示将Toeplitz矩阵[B(p+q),B(p+q+1),...,B(p+q+d-1)]分成四块,其中两块是一样的为t1,另两块为t0和t2 v 0 v 1 将向量[aq,aq+1,...,aq+d-1]T分成两段,T表示矩阵转置,其中可以得到In calculating [B (p+q) ,B (p+q+1) ,...,B (p+q+d-1) ][a q ,a q+1 ,...,a q+ d-1 ] T , since it is a Toeplitz matrix-vector product, is divided into t 1 t 2 t 0 t 1 v 0 v 1 , ( t 1 t 2 t 0 t 1 Indicates that the Toeplitz matrix [B (p+q) ,B (p+q+1) ,...,B (p+q+d-1) ] is divided into four blocks, two of which are the same as t 1 , The other two blocks are t 0 and t 2 , v 0 v 1 Divide the vector [a q ,a q+1 ,...,a q+d-1 ] T into two segments, where T represents matrix transposition, where we can get

== [[ BB (( pp ++ qq )) ,, BB (( pp ++ qq ++ 11 )) ,, .. .. .. ,, BB (( pp ++ qq ++ dd -- 11 )) ]] [[ aa qq ,, aa qq ++ 11 ,, .. .. .. ,, aa qq ++ dd -- 11 ]] TT

== tt 11 tt 22 tt 00 tt 11 vv 00 vv 11 == tt 11 (( vv 00 ++ vv 11 )) ++ vv 11 (( tt 22 ++ tt 11 )) tt 11 (( vv 00 ++ vv 11 )) ++ vv 00 (( tt 00 ++ tt 11 ))

== cc 00 cc 11

图4显示了处理单元PE的CMP,CVP和PWM具体电路。CMP模块的输入是(t0,t1,t2),经过异或门XOR_1和XOR_2,输入(t0+t1,t1,t1+t2);CVP模块输入的是(v0,v1),经过异或门XOR_3,输入(v0,v0+v1,v1);PWM模块是将CMP模块和CVP模块输出的结果进行点对点相乘,经过3个与门AND_1、AND_2和AND_3,输出(v0(t0+t1),t1(v0+v1),v1(t2+t1));FR还原模块利用2个异或门XOR_4和XOR_5,计算出c0=t1(v0+v1)+v1(t2+t1)和c1=t1(v0+v1)+v0(t0+t1),输出(c0,c1)。Figure 4 shows the specific circuits of CMP, CVP and PWM of the processing unit PE. The input of the CMP module is (t 0 , t 1 , t 2 ), through the XOR gates XOR_1 and XOR_2, the input (t 0 +t 1 ,t 1 ,t 1 +t 2 ); the input of the CVP module is (v 0 ,v 1 ), through the XOR gate XOR_3, input (v 0 ,v 0 +v 1 ,v 1 ); the PWM module multiplies the output results of the CMP module and the CVP module point-to-point, and passes through three AND gates AND_1, AND_2 and AND_3, output (v 0 (t 0 +t 1 ), t 1 (v 0 +v 1 ), v 1 (t 2 +t 1 )); the FR restoration module utilizes two XOR gates XOR_4 and XOR_5, Calculate c 0 =t 1 (v 0 +v 1 )+v 1 (t 2 +t 1 ) and c 1 =t 1 (v 0 +v 1 )+v 0 (t 0 +t 1 ), output ( c 0 ,c 1 ).

图2给出了本发明提出的多位元串联脉动乘法器架构,是将图1给出的结构进行折叠得到。图1中使用了k2个运算单元PE,而每行k个运算单元PE的结构和功能是一样的,所以可以用第1行的k个运算单元PE替代剩余的k个运算单元PE,这样需要k个周期。第1个周期A的输入是(A0,A1,…,Ak-1),B直接输入,计算结果经过FRRP还原模块输入到暂存器C中;第2个周期A的输入(Ak,Ak+1,…,A2k-1),B经过R3模块输入,计算结果也经过FRRP还原模块,与第1个周期的计算结果相加,保存在暂存器C中;如此,知道第k个周期,A的输入是

Figure GDA00003008658000088
B经过(k-1)次R3模块后输入,计算结果经过FRRP还原模块,与前面(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,为C=ABmodF(x)。FIG. 2 shows the structure of the multi-bit serial systolic multiplier proposed by the present invention, which is obtained by folding the structure shown in FIG. 1 . In Fig. 1, k 2 computing units PE are used, and the structures and functions of the k computing units PE in each row are the same, so the k computing units PE in the first row can be used to replace the remaining k computing units PE, thus K cycles are required. The input of A in the first cycle is (A 0 ,A 1 ,…,A k-1 ), B is directly input, and the calculation result is input into the temporary register C through the FRRP restoration module; the input of A in the second cycle (A k ,A k+1 ,…,A 2k-1 ), B is input through the R3 module, and the calculation result is also passed through the FRRP recovery module, added to the calculation result of the first cycle, and stored in the temporary register C; thus, Knowing the kth cycle, the input to A is
Figure GDA00003008658000088
B is input after (k-1) times of R3 module, and the calculation result is added to the previous (k-1) accumulated results through the FRRP restoration module, and saved in the temporary register C, and then the temporary register C outputs the result. is C=ABmodF(x).

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims (6)

1.一种快速运算多位元串联脉动双基底二进制有限域乘法器,其特征在于,包括输入端Bk个PE模块、FRRP模块、R3模块,所述k个PE模块串联,所述k个PE模块经k个周期,第1个周期A的输入是                                                
Figure 893260DEST_PATH_IMAGE001
,B直接输入,计算结果经过所述FRRP模块还原输入到暂存器C中;第2个周期A的输入B经过所述R3模块输入,计算结果也经过FRRP模块还原,与第1个周期的计算结果相加,保存在暂存器C中;如此,第k个周期,A的输入是
Figure 199531DEST_PATH_IMAGE003
B经过(k-1)次所述R3模块后输入,计算结果经过所述FRRP模块还原,与所述(k-1)次累加结果相加,保存到暂存器C中,再由暂存器C输出结果,所述R3模块实现
Figure 275107DEST_PATH_IMAGE004
的计算,所述PE模块包括R1模块、CMP模块、CVP模块、PWM模块、个异或门、和个锁存器,所述R3模块输出到所述R1模块后经所述CMP模块进行系数转换,A的分段输入所述CVP模块进行A的分段的系数转换,CMP模块和CVP模块的计算结果均输入到PWM模块,实现
Figure 870277DEST_PATH_IMAGE007
A分段乘积计算,经过
Figure 730917DEST_PATH_IMAGE005
个异或门累加,结果保存在
Figure 476894DEST_PATH_IMAGE005
个锁存器中,由
Figure 422984DEST_PATH_IMAGE005
锁存器输出结果
Figure 414074DEST_PATH_IMAGE008
其中,A通过三项多项式
Figure 690072DEST_PATH_IMAGE009
,表示为
Figure 108415DEST_PATH_IMAGE010
,共有m个系数,即
Figure 338539DEST_PATH_IMAGE011
1. a kind of fast operation multi-bit series pulsating double-base binary finite-field multiplier, is characterized in that, comprises input terminal B , k PE modules, FRRP module, R3 module, described k PE modules are connected in series, and described k After a PE module goes through k cycles, the input of the first cycle A is
Figure 893260DEST_PATH_IMAGE001
, B is directly input, and the calculation result is restored and input to the temporary register C through the FRRP module; the input of A in the second cycle , B is input through the R3 module, and the calculation result is also restored by the FRRP module, added to the calculation result of the first cycle, and stored in the temporary register C ; thus, in the kth cycle, the input of A is
Figure 199531DEST_PATH_IMAGE003
, B is input after ( k -1) times of the R3 module, the calculation result is restored by the FRRP module, added to the ( k -1) accumulation result, saved in the temporary register C , and then used by the temporary Register C output results, the R3 module implements
Figure 275107DEST_PATH_IMAGE004
The calculation of said PE module includes R1 module, CMP module, CVP module, PWM module, XOR gates, and A latch, after the R3 module is output to the R1 module, the coefficient conversion is performed by the CMP module, and the subsection of A is input to the CVP module to perform the subsection coefficient conversion of A, and the calculation of the CMP module and the CVP module The results are all input to the PWM module to realize
Figure 870277DEST_PATH_IMAGE007
and A piecewise product calculation, after
Figure 730917DEST_PATH_IMAGE005
XOR gates are accumulated, and the result is stored in
Figure 476894DEST_PATH_IMAGE005
of latches, by
Figure 422984DEST_PATH_IMAGE005
Latch output result
Figure 414074DEST_PATH_IMAGE008
Among them, A passes through three polynomials
Figure 690072DEST_PATH_IMAGE009
,Expressed as
Figure 108415DEST_PATH_IMAGE010
, there are m coefficients in total, namely
Figure 338539DEST_PATH_IMAGE011
,
使用分段切割法,将m位的A切割成
Figure 133320DEST_PATH_IMAGE012
,每段d位,总共有k 2 个分段,因此有
Figure 201508DEST_PATH_IMAGE013
B通过双基底可表示为
Figure 728435DEST_PATH_IMAGE014
,作为乘法器的另一个输入;C为输出结果。
Using the segmented cutting method, the m- bit A is cut into
Figure 133320DEST_PATH_IMAGE012
, with d bits per segment, there are k 2 segments in total, so there are
Figure 201508DEST_PATH_IMAGE013
; B can be expressed as
Figure 728435DEST_PATH_IMAGE014
, as another input of the multiplier; C is the output result.
2.根据权利要求1所述快速运算多位元串联脉动双基底二进制有限域乘法器,其特征在于,所述FRRP模块包括FR模块、R2模块,所述R2模块实现
Figure 773752DEST_PATH_IMAGE015
的计算,所述FR模块的输入是k个串联PE模块的计算结果,对结果进行还原,输出到R2模块。
2. according to the described rapid operation multi-bit series pulsation double base binary finite field multiplier of claim 1, it is characterized in that, described FRRP module comprises FR module, R2 module, and described R2 module realizes
Figure 773752DEST_PATH_IMAGE015
For calculation, the input of the FR module is the calculation result of k serial PE modules, and the result is restored and output to the R2 module.
3.根据权利要求1所述快速运算多位元串联脉动双基底二进制有限域乘法器,其特征在于,所述CMP模块包括异或门XOR_1和XOR_2,所述异或门XOR_1和XOR_2并联。 3. The fast operation multi-bit series systolic double-base binary finite field multiplier according to claim 1, wherein the CMP module includes exclusive OR gates XOR_1 and XOR_2, and the exclusive OR gates XOR_1 and XOR_2 are connected in parallel. 4.根据权利要求1所述快速运算多位元串联脉动双基底二进制有限域乘法器,其特征在于,所述CVP模块为异或门XOR_3。 4. The fast operation multi-bit series pulsating double base binary finite field multiplier according to claim 1, wherein the CVP module is an exclusive OR gate XOR_3. 5.根据权利要求1所述快速运算多位元串联脉动双基底二进制有限域乘法器,其特征在于,所述PWM模块包括三个并联的与门AND_1、AND_2和AND_3,将所述CMP模块和所述CVP模块输出的结果进行点对点相乘。 5. according to the described rapid operation multi-bit series pulsation double base binary finite field multiplier of claim 1, it is characterized in that, described PWM module comprises three parallel AND gates AND_1, AND_2 and AND_3, described CMP module and The result output by the CVP module is multiplied point-to-point. 6.根据权利要求1所述快速运算多位元串联脉动双基底二进制有限域乘法器,其特征在于,所述FR模块包括两个并联的异或门XOR_4和XOR_5。 6 . The fast operation multi-bit series pulsating double base binary finite field multiplier according to claim 1 , wherein the FR module comprises two parallel exclusive OR gates XOR_4 and XOR_5 .
CN201310115401.7A 2013-04-03 2013-04-03 Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis Expired - Fee Related CN103186360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310115401.7A CN103186360B (en) 2013-04-03 2013-04-03 Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310115401.7A CN103186360B (en) 2013-04-03 2013-04-03 Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis

Publications (2)

Publication Number Publication Date
CN103186360A true CN103186360A (en) 2013-07-03
CN103186360B CN103186360B (en) 2016-08-03

Family

ID=48677539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310115401.7A Expired - Fee Related CN103186360B (en) 2013-04-03 2013-04-03 Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis

Country Status (1)

Country Link
CN (1) CN103186360B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252332A (en) * 2014-08-20 2014-12-31 哈尔滨工业大学深圳研究生院 Multiplier and multiplier processing element for ellipse cipher apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW527561B (en) * 2001-11-02 2003-04-11 Chiou-Ying Lee Low-complexity bit-parallel systolic multiplier over GF (2m)
TW200710716A (en) * 2006-11-24 2007-03-16 Univ Lunghwa Sci & Technology Low-complexity finite field GF(2m) bit-parallel systolic array dual-basis multiplier
CN102073477A (en) * 2010-11-29 2011-05-25 北京航空航天大学 Implementation method of finite field multiplying unit with functions of detecting, correcting and locating error
CN102929574A (en) * 2012-10-18 2013-02-13 复旦大学 Design method of systolic multiplier on GF(2163) domain

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW527561B (en) * 2001-11-02 2003-04-11 Chiou-Ying Lee Low-complexity bit-parallel systolic multiplier over GF (2m)
TW200710716A (en) * 2006-11-24 2007-03-16 Univ Lunghwa Sci & Technology Low-complexity finite field GF(2m) bit-parallel systolic array dual-basis multiplier
CN102073477A (en) * 2010-11-29 2011-05-25 北京航空航天大学 Implementation method of finite field multiplying unit with functions of detecting, correcting and locating error
CN102929574A (en) * 2012-10-18 2013-02-13 复旦大学 Design method of systolic multiplier on GF(2163) domain

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHIOU-YNG LEE: "《Low-Complexity Bit-Parallel Sysolic Montgomery Multipliers for Special Classes of GF(2/sup m)》", 《IEEE TRANSACTION ON COMPUTERS》, vol. 54, no. 9, 25 July 2005 (2005-07-25), pages 1061 - 1070 *
HAINING FAN ET AL.: "Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases", 《IEEE TRANSACTION ON COMPUTERS》, vol. 56, no. 10, 25 October 2007 (2007-10-25), pages 1435 - 1437, XP011191962, DOI: doi:10.1109/TC.2007.1076 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252332A (en) * 2014-08-20 2014-12-31 哈尔滨工业大学深圳研究生院 Multiplier and multiplier processing element for ellipse cipher apparatus
CN104252332B (en) * 2014-08-20 2018-09-18 哈尔滨工业大学深圳研究生院 A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device

Also Published As

Publication number Publication date
CN103186360B (en) 2016-08-03

Similar Documents

Publication Publication Date Title
Lee Low complexity bit-parallel systolic multiplier over GF (2 m) using irreducible trinomials
Kim et al. FPGA implementation of high performance elliptic curve cryptographic processor over GF (2163)
CN104184578B (en) A kind of Elliptic Curve Scalar Multiplication method accelerating circuit and its algorithm based on FPGA
Xie et al. High-throughput finite field multipliers using redundant basis for FPGA and ASIC implementations
Reyhani-Masoleh A new bit-serial architecture for field multiplication using polynomial bases
CN103186360B (en) Binary system Galois field multiplier at the bottom of rapid computations many bits series connection pulsation double-basis
Kim et al. Computation of AB2 multiplication in GF (2m) using low-complexity systolic architecture
CN103942027A (en) Reconfigurable rapid parallel multiplier
Lee Super Digit-Serial Systolic Multiplier over GF (2^ m)
Meher Systolic formulation for low-complexity serial-parallel implementation of unified finite field multiplication over GF (2 m)
Talapatra et al. Unified digit serial systolic Montgomery multiplication architecture for special classes of polynomials over GF (2m)
Saravanan et al. Performance analysis of reversible finite field arithmetic architectures over GF (p) and GF (2m) in elliptic curve cryptography
Mozhi et al. Efficient bit-parallel systolic multiplier over GF (2 m)
Meher High-throughput hardware-efficient digit-serial architecture for field multiplication over GF (2 m)
Jeon et al. Low-power exponent architecture in finite fields
Pradhan et al. Digit-Size Selection for FPGA Implementation of Generic Digit-Serial Multiplication Over GF (2m)
Hariri et al. Digit-level semi-systolic and systolic structures for the shifted polynomial basis multiplication over binary extension fields
Pillutla et al. High-throughput area-delay-efficient systolic multiplier over GF (2m) for a class of trinomials
Rashmi et al. Optimized reversible montgomery multiplier
Tujillo-Olaya et al. Hardware architectures for elliptic curve cryptoprocessors using polynomial and Gaussian normal basis over GF (2 233)
Trujillo-Olaya et al. Half-matrix normal basis multiplier over GF ($ p^{m} $)
Ku et al. ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM
Dake et al. Implementation of high-throughput digit-serial redundant basis multiplier over finite field
Rajalakshmi et al. Low-complexity systolic design for finite field multiplier
Fournaris et al. Low area elliptic curve arithmetic unit

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160803

Termination date: 20180403

CF01 Termination of patent right due to non-payment of annual fee