CN1259617C - Method for accelerating RSA encryption/decryption procedure and its analog multiplication and analog power operation circuit - Google Patents

Method for accelerating RSA encryption/decryption procedure and its analog multiplication and analog power operation circuit Download PDF

Info

Publication number
CN1259617C
CN1259617C CN 03156754 CN03156754A CN1259617C CN 1259617 C CN1259617 C CN 1259617C CN 03156754 CN03156754 CN 03156754 CN 03156754 A CN03156754 A CN 03156754A CN 1259617 C CN1259617 C CN 1259617C
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
register
operation
assigned
bit
modular multiplication
Prior art date
Application number
CN 03156754
Other languages
Chinese (zh)
Other versions
CN1492316A (en )
Inventor
孙东昱
龚宗跃
赵红敏
于鹏
Original Assignee
大唐微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

本发明公开一种加快RSA加/解密过程的方法及其采用该算法的模乘、模幂运算电路,本发明模乘算法在现有多精度CIOS算法的基础上作了改进,将两次内循环改为一次,并减少了访问外部变量的次数;本发明模乘运算电路,由加法、乘法、地址、循环运算模块,数据寄存器、逻辑控制模块、内部线路及一些特殊功能模块组成,顺序执行本发明算法的运算,减少了所需的操作步,从而提高了运算速度,同时可以对运算数据的长度进行设定;本发明模幂运算电路,由上述模乘运算电路和CPU、系统RAM组成,由CPU控制完成多次模乘运算,在两次模乘运算间,采用动态数据地址指针技术修改模乘运算电路中的基址,大大加快了模幂运算的速度。 The present invention discloses an accelerating RSA encryption / decryption process using the method and algorithm of the modular multiplication, modular exponentiation circuitry, analog multiplication algorithm of the present invention improves upon existing multiple-precision algorithms CIOS, two inner cycle to once, and reduces the number of accesses to external variables; multiplication circuit die of the present invention, by the addition, multiplication, address, cycle operation module, data registers, logic control module, the internal wiring and special function modules, the order of execution calculation algorithm of the present invention reduces the required operating step, thereby increasing the operation speed, and can be set to the length of the operation data; modular exponentiation circuit of the present invention, by the modular multiplication circuit and CPU, system RAM composition , controlled by the CPU to complete multiple modular multiplication, modular multiplication between operations, data address pointers dynamic technical modifications modular multiplication circuit base address, greatly accelerating the speed modular exponentiation.

Description

一种加快RSA加/解密过程的方法及其模乘、模幂运算电路 One kind of accelerating RSA encryption / decryption method and modular multiplication, modular exponentiation arithmetic circuit

技术领域 FIELD

本发明涉及加/解密的方法及硬件电路,尤其涉及一种加快RSA加/解密过程的方法,及用于该方法的模乘、模幂运算电路。 The present invention relates to encryption / decryption method and a hardware circuit, particularly to a method for RSA encryption / decryption process is accelerated, and the method for modular multiplication, modular exponentiation circuit.

背景技术 Background technique

随着智能卡技术应用领域的不断扩展,人们对信息安全的要求不断提高。 With the continuous expansion of the field of smart card technology, people's requirements for information security continues to increase. 在众多的加/解密算法中,RSA算法是目前流行很广的一种公开密钥算法,运用该算法可以实现数字签名、数据加密等应用。 Among the encryption / decryption algorithm, RSA algorithm is very widely used as a popular public-key algorithm, the algorithm can be implemented using digital signatures, data encryption and other applications.

RSA加密算法的加密过程可以表述为:E=CemodN;解密过程为C=EdmodN,其中C、E、e、d、N为很大的二进制数,通常为512位、1024位或更长,其中C为要传送的明文,(e,N)为加密密钥,e是公开的,经过加密运算后生成密文E,(d,N)为解密密钥,d是保密的,通过解密运算得到传送的明文C,其公开密钥和私人密钥是一对大素数的函数。 Encryption RSA encryption algorithm can be expressed as: E = CemodN; decryption is C = EdmodN, where C, E, e, d, N is a large binary number, typically 512, 1024 or more, wherein C is a plain text to be transmitted, (e, N) as the encryption key, e is disclosed after the encryption computation generates a ciphertext E, (d, N) of the decryption key, d is kept secret, decryption is obtained by C plaintext transmission, the private key and the public key is a function of one pair of large prime numbers. RSA的安全是基于大素数分解的难度,其正确性已被理论和实践所证明,并且已经制造出了许多采用RSA加密算法的芯片。 RSA security is based on the difficulty of large prime number decomposition, it has been proved the correctness of the theory and practice, and has produced a number of chip uses RSA encryption algorithm.

RSA加/解密算法的核心问题是大数模幂运算,它的运算量很大,而且实现起来也存在一定难度,但是我们知道模幂运算可以转化成多次的模乘运算来实现。 The core issue RSA encryption / decryption algorithm is the large modular power operation, a large amount of its operations, and to achieve them there are some difficulties, but we know that can be converted into a modular exponentiation multiple of modular multiplication to achieve. 因此可以通过设计高效的大数模乘协处理器来解决模幂运算问题,提高系统性能。 So it can be solved by the modular exponentiation problems efficient design Modular Multiplication coprocessor, to improve system performance. 模幂运算的一种分解算法如下:beginC=C×RmodNX=1×RmodNfor i=u-1 down to 0X=MonPro( X, X)if(ei=1)then X=MonPro( C, X)X=MonPro( X,1)return X A modular exponentiation decomposition algorithm as follows: beginC = C × RmodNX = 1 × RmodNfor i = u-1 down to 0X = MonPro (X, X) if (ei = 1) then X = MonPro (C, X) X = MonPro (X, 1) return X

end式中的C,e,N的含义如前所述,R是与N互素的一个基,通常R=2s,S表示N的位数,最后返回的X值即为密文E。 end of the formula C, E, N the same meanings as above, R is a group with N prime, typically R = 2s, S represents the number of bits N, X is the return value of the last ciphertext E. Monpro(A,B)代表蒙格玛丽(Montgomery)算法函数。 Monpro (A, B) Representative Mary Munger (Montgomery) algorithm function. 通过上述算法,将模幂运算转换成了多次大数模乘运算。 Through the above algorithm, modular exponentiation converted into multiple large modular multiplication. 因为Monpro(A,B)返回的是A×B×R′modN的值(R×R-1modN=1),为了消去其中的R′,在算法开始阶段将C变换为C,X变换为X后再进行运算,最后再从X变换回X。 Since Monpro (A, B) returns A × B × R'modN value (R × R-1modN = 1), in order to cancel wherein R ', the algorithm starts at stage C is converted to C, X is converted into X and then calculates, and finally transformed back from X. X

其中,原始的Montgomery模乘算法可以表述如下:function REDC(T)m=(Tmod R)N′mod Rt=(T+mN)/Rif t≥N thenreturn t-Nelse return tR-1和N满足0<R-1<N,0<N-1<R,RR-1-NN-1=1;T为给定的大整数T且0≤T≤RN,在上述模幂运算的分解算法中,T= X× X或者T= C× X。 Wherein the original Montgomery multiplication algorithm can be expressed as follows: function REDC (T) m = (Tmod R) N'mod Rt = (T + mN) / Rif t≥N thenreturn t-Nelse return tR-1 and N satisfy 0 <R-1 <N, 0 <N-1 <R, RR-1-NN-1 = 1; T is a large integer given T and 0≤T≤RN, in the above-described decomposition algorithm of modular exponentiation, T = X × X or T = C × X. 关于模幂运算和原始Montgomery算法可参考中国专利申请97110289.9。 About Montgomery modular exponentiation algorithm may refer to the original Chinese Patent Application No. 97110289.9.

为了实现原始的Montgomery算法,在IEEE Micro,June 1996中公开了一篇名为“ANALYZING AND COMPARING MONTGOMERYMULTIPLICATION ALGORITHMS”Page 26~33的文章,介绍了一种用于实现多精度Montgomery算法的CIOS算法,将Montgomery算法分解成了适于在硬件上实现的位操作模式,在文中可以看出,与其它的SOS、FIPS、FIOS及CIHS算法相比,CIOS算法所进行的乘法、加法和读写操作次数是最少的。 In order to achieve the original Montgomery algorithm, IEEE Micro, June 1996 discloses in an article entitled "ANALYZING AND COMPARING MONTGOMERYMULTIPLICATION ALGORITHMS" Page 26 ~ 33, introduces an algorithm for multi CIOS accuracy Montgomery algorithm, the Montgomery algorithm suitable for the cleavage mode of operation would be implemented in hardware, it can be seen in the text, as compared with other SOS, FIPS, FIOS and CIHS algorithm, the number of multiplications, additions and write operations performed by the algorithm is CIOS Minimal. 由于本发明是对多精度CIOS算法的改进,因此,下面对其运算过程作较详细地说明,为了便于理解,其中的变量符号根据本发明作了调整。 Since the present invention is an improvement of multiple-precision arithmetic CIOS, therefore, its operation process stated below in greater detail, for ease of understanding, wherein Symbol been adjusted in accordance with the present invention. 至于公式的推导,请参照上述公开文献的介绍。 As for the derivation of the formula, refer to the publication described above.

该CIOS算法中的常数R(含义同前所述)、乘数x、被乘数y、模N都是s位r进制整数,即x=xs-1xs-2…x1x0,y=ys-1ys-2…y1y0,n=ns-1ns-2…n1n0;用来保存计算结果(和中间结果)的S有s+2位(中间储存的需要),即S=Ss+1Ss…S1S0,r=2k,K可以为8、16、32或更大;C1、T1、n'[0]及m均为一位r进制数,C1用于存储位运算结果的高位或进位,T1用于暂存位运算结果的低位或和,下面不再重复说明。 The algorithm CIOS constant R (the same meaning as previously described), the multiplier x, multiplicand y, modulo-N r are s-bit binary integer, i.e. x = xs-1xs-2 ... x1x0, y = ys- 1ys-2 ... y1y0, n = ns-1ns-2 ... n1n0; to save the calculation result (intermediate result) has S s + 2 bits (intermediate storage needed), i.e., S = Ss + 1Ss ... S1S0, r = 2k, K may be 8,16, 32 or more; C1, T1, n '[0] r and m are a number of binary, C1 to carry or store high bit of the result, T1 for temporary or low bit of the result and the following description is not repeated. n'[0]为常数且满足n'[0]=-n[0]-1mod2k,m为中间变量。 n '[0] is constant and satisfies n' [0] = - n [0] -1mod2k, m is an intermediate variable. 通过该算法可以用位乘法、位加法、进位处理及相应的数据读取操作来完成大数模乘运算。 Can be used by the algorithm bit multiplication, bit addition, the carry process and the corresponding data read operation to complete the modular multiplication of large numbers. 与原始的Montgomery算法相似,该算法由在同一外循环中交替进行的S=x×y和S=(S+mn)/R两个内循环运算,以及最后根据S与N的比较结果,返回S值或SN值的选择运算来实现。 Montgomery similar to the original algorithm, the algorithm performed by the S alternately in the same outer loop = x × y and S = (S + mn) / R the two cycle operation, and the final result of the comparison of S and N, return S SN value or select operation is achieved. 其运算流程及其说明如下,在运算开始前,将S的各位置为0:for i=0 to s-1 ∥开始外循环{C1=0 ∥进位清零for j=0 to s-1 ∥将y的第i位与x的第j位相乘后,与S的第j(C1,T1)=S[j]+x[j]y[i]+C1位和进位C1相加,结果赋给T1和C1,再将T1S[j]=T1的值赋给S的第j位,以j为循环变量(C1,T1)=S[s]+C1∥将S的第s位与C1相加S[s]=T1∥将相加的和赋给S的第s位S[s+1]=C1∥将相加的进位赋给S的第s+1位,得到y[i]与x相乘的结果,并存入SC1=0m=S[0]n′[0]mod2k∥按公式计算m值(C1,T1)=S[0]+mn[0] ∥将S[0]加上mn[0],结果赋给T1和C1for j=1 to s-1 ∥将得到的S的第j位与mn[j]及进位C1相加,(C1,T1)=S[j]+mn[j]+C1结果赋给T1和C1,再将T1的值赋给S的j-1S[j-1]=T1位,以j为循环变量(C1,T1)=S[s]+C1∥将上一循环结束时的C1值与S的 And the calculation process described below, before the operation begins, the respective position S is 0: for i = 0 to s-1 ∥ outer loop starts {C1 = 0 ∥ carry clear for j = 0 to s-1 ∥ y is the i-th and j-th bit position multiplied with the j-th S (C1, T1) = S [j] + x [j] y [i] + C1 sum bit and a carry C1 of x, the result assigned to T1 and C1, then the value T1S [j] = T1 assigned to the j-th bit s, j is a loop variable (C1, T1) = s [s] + C1∥ the first and C1 s-bit s adding s [s] = T1∥ and assigned to the added s of s-bit s [s + 1] = C1∥ assigned to the added bit s into a first bit s + 1, to obtain y [i] x and multiplying the result and stored SC1 = 0m = S [0] n '[0] mod2k∥ value calculated according to the formula m (C1, T1) = S [0] + mn [0] ∥ the S [0 ] plus mn [0], and the result is assigned to T1 C1for j = j-th bit and Mn [j] and the carry C1 sum S of 1 to s-1 ∥ obtained, (C1, T1) = S [j ] + mn [j] + C1 T1 and the result is assigned to C1, then the value of T1 is given to the S j-1S [j-1] = T1 bit, j is a loop variable (C1, T1) = S [s ] + C1∥ C1 of the upper and S at the end of a cycle s位相加S[s-1]=T1∥将相加的和赋给S的第s-1位S[s]=S[s+1]+C1∥将相加的进位加S的第s+1位赋给S的第s位} (因为(S+mn)/R是通过取(S+mn)的高s位计算,因此第二次内循环中包含了一次移位运算)C1=0 ∥开始返回结果的选择运算for j=0 to s-1 ∥通过循环,运算SN(C1,y[j])=S[j]-n[j]-C1(C1,y[s])=S[s]-C1if C1=0 then return y[j] ∥如果运算无借位,即S>N,返回结果为S-Nelse return S[j] 如果有借位,返回结果为S在IC智能卡中一般通过协处理器来专门进行RSA加密运算过程中的大数模乘运算,以加速运算过程。 Addition of s-bit S [s-1] = T1∥ the added first and given to the S s-1 bits S [s] = S [s + 1] + C1∥ bits plus the added into the S s + 1 bit is assigned to s-th bit s} (since the (s + mn) / R s are calculated by a high bit takes (s + mn), so that the second inner loop contains a shift operation) a C1 = 0 ∥ begin to return the result of selection operations for j = 0 to s-1 ∥ through the loop, the calculation SN (C1, y [j]) = S [j] -n [j] -C1 (C1, y [s] ) = S [s] -C1if C1 = 0 then return y [j] ∥ if no borrow operation, i.e., S> N, returns a value of S-Nelse return S [j] If a borrow bit, the result is returned in S IC cards generally to modular multiplication of large numbers specifically RSA encryption computation process by the coprocessor to speed operation process. 其组成的系统如图1所示。 Which system components shown in Figure 1. CPU 1将所要运算的数据装入系统RAM 3,并控制模乘算法协处理器2完成一次模乘运算,系统RAM 3用来存储初始的运算数据和运算结果。 CPU 1 data will be loaded into computing system RAM 3, and the control algorithm for modular multiplication coprocessor 2 to complete a modular multiplication, the system RAM 3 for storing the initial operation and the operation result data. 通过CPU核与协处理器的配合,进行多次模乘运算,最终完成RSA加/解密算法中的模幂运算。 By fitting the CPU core and the coprocessor, multiple modular multiplication, modular exponentiation finalized RSA encryption / decryption algorithm.

用协处理器来实现上述算法时,要完成的基本运算为乘法、加法、及读/写操作,在硬件结构中,乘法和加法是分别通过乘法运算模块、加法运算模块来完成的,由于上述算法中并没有并行的加法或乘法运算,因此只需设置一个加法器和一个乘法器。 When a coprocessor to achieve the above algorithm, the basic operation is to be completed multiplication, addition, and read / write operations in a hardware configuration, the multiplications and additions are done by each multiplication module, an adder module, since the the algorithm is not parallel addition or multiplication, is provided so that only an adder and a multiplier. 而由于乘数x、被乘数y、模N及结果S等大整数存放在系统RAM中,需要在逻辑控制模块的控制下,先将从地址寄存器中将所需操作数的地址放入RAM地址寄存器,而后从系统RAM的相应地址中将数据读入到对应的操作数寄存器,或者将数据从协处理器写入到系统RAM的相应地址中,完成一次读/写操作。 Due to the large integer multiplier address x, multiplicand y, the modulus N and S, the result is stored in the system RAM, it is necessary under control of the logic control module, the required operand from the first address register will be placed in RAM address register, then the data in the corresponding address is read from system RAM into the corresponding register operand, or writing data to the corresponding address from the coprocessor to the system's RAM, to complete a read / write operation. 用协处理器实现该算法时,由于加法、乘法、写地址,及读/写RAM操作可以通过不同的数据线来进行,因而在同一操作步中可以并行,但是同类操作在一个操作步中只能进行一次。 When implementing the algorithm with a coprocessor, since an adder, a multiplier, a write address and a read / write operation of RAM can be performed by different data lines, it is possible to operate in parallel in the same step, but the same operation is only one operation step can once.

现有CIOS算法两个内循环的每次循环运算中,均包括了二次读操作和一次写操作,因此最少可以通过3个操作步来完成一次循环,二个内循环共需3(s-1)+3(s-2)=6s-9个操作步(加、乘运算与读/写操作并行,并且忽略最初不可并行的读操作)。 Each cycle within the two arithmetic algorithms CIOS prior cycle, comprises secondary read operation and a write operation, thus a minimum cycle can be accomplished by a three-step operation, the two cycle totaling 3 (s- 1) +3 (s-2) = 6s-9-step operations (addition, multiplication with the read / write operations in parallel, and to ignore the initial non-parallel read operation). 由于协处理器需频繁访问系统RAM,运算速度不理想。 Since the coprocessor need frequent access to the system RAM, speed of operation is not desirable.

发明内容 SUMMARY

有鉴于此,本发明要解决的技术问题是提供一种加快RSA加/解密过程的方法,能够有效地提高RSA加/解密过程的速度。 Accordingly, the present invention is to solve the technical problem is to provide a method for accelerating RSA encryption / decryption process, can improve the speed of RSA encryption / decryption process.

为了达到上述目的,本发明提供了一种加快RSA加/解密过程的方法,包括以下步骤:(A)在对要传送明文H进行加密时,根据E=HemodN进行模幂运算,得到密文E,完成加密,其中e、N为加密密钥;(B)在对传送来的密文E进行解密时,根据H=EdmodN进行模幂运算,得到明文H,完成解密,其中d、N为解密密钥;上述模幂运算是分解为多次的蒙格玛丽模乘算法来实现的,其特征在于,所述蒙格玛丽模乘算法通过以下方法实现:设常数R、乘数x、被乘数y、模N都是s位r进制整数,x=xs-1xs-2…x1x0,y=ys-1ys-2…y1y0,n=ns-1ns-2…n1n0;S为s+1位r进制整数,S=SsSs-1…S1S0;r=2k;中间变量C1、T1均为一位r进制数,n'[0]为运算常数,i、j为循环变量,其特征在于,本算法还包括中间变量一位二进制数C和一位r进制数T2,在运算前先对变量S、T1、T2、C1及C赋零值,其运算步骤如下:(a)令i To achieve the above object, the present invention provides a method for accelerating RSA encryption / decryption process, comprising the steps of: (A) at the time of the plaintext to be transmitted is encrypted H, E = HemodN The modulo exponentiation arithmetic, to obtain ciphertext E , encrypt, wherein e, N encryption key; (B) at the time of the transmitted ciphertext decrypts E performs a modular exponentiation according to H = EdmodN, and expressly H, decryption is completed, where d, N decryption key; and the modular exponentiation is accomplished decomposition Munger Mary modular multiplication algorithm a plurality of times, wherein, said modular multiplication algorithm Colin Mary by the following method: set constant R, multiplier x, multiplied number y, modulo-N r are s-bit binary integer, x = xs-1xs-2 ... x1x0, y = ys-1ys-2 ... y1y0, n = ns-1ns-2 ... n1n0; s is the s + 1-bit r binary integer, S = SsSs-1 ... S1S0; r = 2k; intermediate variables C1, T1 are a hexadecimal number r, n '[0] is operation constant, i, j is a loop variable, characterized in that this algorithm comprises a binary intermediate variables and C r a hexadecimal number T2, the first zero value of the variable S, T1, T2, C1 and C Fu, the calculation in the steps before the operation: (a) letting i 为0,开始外循环;(b)将S的第0位加上x第0位与y第i位的积,结果的低位赋给T1,高位赋给C1;(c)将C1加上S第1位,和赋给T2,进位赋给C;(d)将T1与n'[0]相乘后,求其对模2k的余数,结果赋给m;(e)将T1加上m与n[0]的积,结果的低位赋给T1,高位赋给C1;(f)令j=1,开始内循环;(g)将T2与x的第j位与y的第i位的积以及进位C1相加,低位赋给T1,高位赋给C1;(h)将S的第j+1位与C1及C相加,和赋给T2,进位赋给C;(i)将T1加上m与n[j]的积,低位赋给T1,高位赋给C1;(j)将T1的值赋给S的第j-1位,循环变量j加1,重复内循环直到j等于s,退出内循环;(k)将T2加上C1,和赋给T1,进位赋给C1;(m)将T1值赋给S的第s-1位;(n)将C1加上C,和赋给S的第s位,循环变量i加1,重复外循环直到j等于s,退出外循环;(o)给C重新赋零值;(p)令j=0,开始循环;(q)将S的第j位减去n的第j位和借位C,差赋给y的第j Is 0, the outer loop starts; (b) the S bit 0 bit 0 and x plus y i bits of the product, the lower the result is assigned to Tl, high assigned C1; (C) C1 plus the S No. 1, and assigned to T2, assigned to carry C; (d) the T1 and n '[0] by multiplying, seeking the remainder of the 2k mode, the result is assigned to m; (E) to T1 plus m and n [0] of the product, the lower the result is assigned to Tl, high assigned C1; (f) so that j = 1, the cycle begins; (G) the j-th bit and the second bit y i x T2 and the adding the product and the carry C1, T1 is assigned to the low, high assigned C1; (H) S of the j + 1-bit sum of C and C1, and assigned to T2, the carry C is assigned; (I) T1 is together with the n-m [j] of the volume, T1 is assigned to the low, high assigned a C1; (j) the value of T1 is assigned to the first bit S j-1, the loop variable j by 1, repeating the loop until j is equal to s, the loop is exited; (K) T2 plus the C1, T1 and assigned, assign the carry C1; (m) the value of T1 is assigned to the first space S s-1; (n-) C1 plus the C, and s-bit is assigned to s, the loop variable i is incremented by 1, the outer loop is repeated until j is equal to s, exits the outer loop; (O) to re-zero the value C; (P) so that j = 0, the cycle begins; (Q ) S of the j-th bit of the j-th subtracting n and borrow C, assigned to the j-th differential of Y ,借位赋给C;循环变量j值加1,重复该循环直到j等于s时,退出循环;(r)将S的第s位减去借位C,差赋给y的第s位,借位赋给C;(s)如果借位C为零,返回y,否则返回S。 , Assigned to borrow C; loop variable j is incremented by one, the cycle is repeated until j is equal to s, the loop is exited; (R & lt) of the s-bit S minus borrow C, the difference between the s-th bit y is assigned, assigned to borrow C; (S) if C is zero borrow, return y, otherwise S. ,由上可知,本发明在多精度Montgomery算法的基础之上,以多精度CIOS算法为基础并加以改进,减少了协处理器对系统RAM的访问次数,提高了运算速度。 From the above, the present invention is based on the multiple-precision arithmetic Montgomery, multi CIOS accuracy based on the algorithm and to improve and reduce the number of access to the system RAM of the coprocessor, to improve the processing speed.

本发明要解决的另一技术问题是提供一种能够实现上述蒙格玛丽模乘算法的模乘运算电路。 Another technical problem to be solved by the present invention is to provide a modular multiplication circuit can be realized Munger Mary's modular multiplication algorithm described above.

为了达到上述目的,本发明提供了一种实现蒙格玛丽模乘算法的模乘运算电路,运算字长为K,包括:数据寄存器,用于提供所述算法中加/减法运算和乘法运算的数据及保存运算的中间结果;地址运算模块,用于提供对系统RAM读写的地址,以将系统RAM数据读入数据寄存器,或将数据寄存器的数据写入系统RAM的相应位置;乘法运算模块,用于从数据寄存器中选择进行运算的乘数和被乘数,执行乘法运算,并将运算结果保存到特定的数据寄存器中;加/减法运算模块,用于从数据寄存器中选择进行运算的加数和被加数,执行加法运算,并将运算结果保存在相应的数据寄存器中;逻辑控制模块,用于生成各种控制信号协调整个电路的工作,使其按设定的操作步顺序完成所述算法中的运算步骤,其中加/减法、乘法、读/写及写地址的操作可以在一个操作步中 To achieve the above object, the present invention provides a method for implementing modular multiplication algorithm Mary Munger modular multiplication circuit, the operational word length is K, comprising: a data register for providing the add / subtraction and multiplication of the algorithm intermediate calculation result data and stored; address calculation module, for providing read and write addresses of the system RAM, system RAM in order to read the data into the data register or the data register corresponding to the position of writing the system RAM; multiplication module , for selecting operation from the data register of the multiplier and multiplicand, perform the multiplication, and save the operation results to a specific data register; addition / subtraction module, for selecting operation from the data register augend and the addend, addition operation is performed, the operation result stored in the corresponding data register; logic control module for generating various control signals to coordinate the work of the whole circuit, so as to complete the pressing operation step sequence set the algorithm calculation step, wherein addition / subtraction, multiplication, the read / write address and write operations may be in an operating step 行完成;循环运算模块,用于对内外循环运算进行计数,并提供地址运算和循环控制所需的循环进程信息;内部线路,完成协处理器内部部件间的数据传输,并通过接口与CPU、系统RAM的总线相连;以及启停控制模块,由CPU控制,用来启动和停止一次模乘运算的过程。 Bank completed; cycle calculation module for external circulation operation is counted, and provides an address calculation cycle and control cycles required to process information; internal line, data transmission is completed between the internal components of the coprocessor, and the CPU through an interface, RAM connected to the system bus; a control module and a start and stop, controlled by the CPU, to start and stop a modular multiplication process.

由上可知,本发明的模乘运算电路能够实现本发明算法,在多精度算法的运算过程中对系统RAM的访问次数少,运算速度快外,此外,还对协处理器的结构设计进行了优化,设计了系统配置寄存器,使本发明的协处理器支持从256bit到1024bit的数据运算长度,增加了应用的灵活性。 From the above, modular multiplication circuit according to the present invention can be implemented algorithm of the present invention, access to the system RAM in fewer multiple-precision arithmetic operation process, the high operation speed, but in addition, also structural design was coprocessor optimization of the design of the system configuration register, so that the present invention supports the coprocessor arithmetic data from 256bit to 1024bit length, increasing application flexibility.

本发明要解决的又一技术问题是提供一种模幂运算电路,具有高的运算速度。 The present invention still another technical problem to be solved is to provide a modular exponentiation arithmetic circuit having a high operation speed.

为了达到上述目的,本发明提供了一种包括上述模乘运算电路的模幂运算电路,CPU、系统RAM分别与所述模乘运算电路相连;CPU首先将模乘运算电路置于非工作状态,并对系统RAM和模乘运算电路的变量进行初始化,然后CPU使模乘运算电路置于工作状态,由模乘运算电路完成一次模乘运算,然后所述CPU调整模乘运算电路中的乘数、被乘数和结果的基址,使其分别对应于下一次模乘运算的乘数、被乘数和结果在系统RAM中的存储位置,接着进行下一次模乘运算,所述CPU按该方式控制所述模乘运算电路按模幂运算的分解算法完成多次模乘运算后,得到模幂运算结果。 To achieve the above object, the present invention provides a modular exponentiation arithmetic circuit comprising the above-described modular multiplication circuit, CPU, system RAM connected to the modular multiplication circuit, respectively; the CPU first modular multiplication circuit in a non-operating state, variables and system RAM and modular multiplication circuit is initialized, then the CPU causes the modular multiplication circuit is put into an operational state, to complete a modular multiplication by a modular multiplication circuit, then the CPU adjustment modular multiplication in the multiplier circuit multiplicand and base address results, so storage locations respectively corresponding modular multiplication of a multiplier, multiplicand, and the results in the RAM in the system, the time the mold followed by multiplication, by said CPU that controlled decomposition of the modular multiplication arithmetic operation circuit is completed by multiple modular exponentiation modular multiplication, modular exponentiation result obtained.

由此可见,本发明的模幂运算电路除了具有模乘运算电路的各种优点外,还在设计中采用了动态数据地址指针技术,在两次模乘运算之间无需调整数据的存储位置,只需要调整数据的地址指针即可,大大加快了模幂运算的速度。 Thus, the modulo exponentiation arithmetic circuit of the invention has various advantages in addition to the modular multiplication circuit, but also used in the design of dynamic data address pointer technology, without adjusting the position data stored between the two modular multiplication, only need to adjust the pointer to the address of data, greatly accelerates the modular exponentiation.

附图说明 BRIEF DESCRIPTION

图1是模乘算法协处理与CPU和RAM的连接示意图;图2是本发明实施例蒙格玛丽模乘算法的流程图;图3是本发明实施例模乘算法协处理的硬件结构图;以及图4是本发明实施例CPU与协处理器、RAM配合工作的流程图。 Figure 1 is a schematic view of a modular multiplication arithmetic co-processor is connected to the CPU and RAM; FIG. 2 is a flowchart Mary Munger modular multiplication algorithm of the present invention; FIG. 3 is an embodiment of the present invention, a hardware embodiment of a modular multiplication arithmetic co-processing configuration diagram; and Figure 4 is a CPU and the coprocessor embodiment of the present invention, with the work RAM flowchart.

具体实施方式 Detailed ways

本发明的算法是在多精度Montgomery算法的基础之上,以多精度CIOS算法为基础并加以改进。 Algorithm of the present invention is based on the multiple-precision Montgomery algorithm, a multi-precision arithmetic based CIOS and improved. 算法中与现有多精度CIOS算法相同的参数含义相同,常数R、乘数x、被乘数y、模N都是s位r进制整数(一般是很大的二进制数,例如1024位或512位长),x=xs-1xs-2…x1x0,y=ys-1ys-2…y1y0,n=ns-1ns-2…n1n0;用来保存计算结果(和中间结果)的S有s+1位,即S=SsSs-1…S1S0;r=2k,K为机器的运算字长,可以为8、16、32或更大;C1、T1、n'[0]均为一位r进制数,具有固定的K位机器字长,其中C1用于存储运算结果的高位或进/借位,T1用于暂存运算结果的低位或和,n'[0]为常数。 The same algorithm CIOS conventional multiple-precision arithmetic parameters have the same meaning, the constant R, the multiplier x, multiplicand y, modulo-N r are s-bit binary integer (typically a binary number is great, for example, 1024 or 512 long), x = xs-1xs-2 ... x1x0, y = ys-1ys-2 ... y1y0, n = ns-1ns-2 ... n1n0; to save the calculation result (intermediate result) with a s + S 1, i.e., S = SsSs-1 ... S1S0; r = 2k, K is the word length of the machine operation, it can be 8,16,32 or larger; C1, T1, n '[0] are an intake r number, and having a K-bit fixed word length machine wherein the upper C1 for storing operation results or intake / borrow, Tl for temporarily storing the computation result or low and, n '[0] is a constant. 除此之外,本发明的算法中增加了一位二进制数C和一位r进制数T2,用于储存进位C1或/和C和S对应位相加的中间结果。 In addition, the algorithm of the present invention adds a binary number r C, and a hexadecimal number T2, used and / or intermediate results S and C corresponding to the added storage carry bit C1. 在运算开始前,将S的各位及C1、T1、T2均赋上零值。 Before operation starts, the S and everybody C1, T1, T2 are assigned a value of zero.

请同时参照图2,本发明算法的流程如下所述。 Referring to FIG. 2, the flow follows the algorithm of the present invention.

for i=0;i<s;i++ ∥令i为0,开始外循环,每次循环i值加1,当i等{ 于s时,即出循环;步骤100(C1,T1)=S[0]+x[0]y[i] ∥将S的第0位加上x第0位与y第i位的积,结果的低位赋给T1,高位赋给C1;步骤102(C,T2)=C1+S[1] ∥将C1加上S的第1位,和赋给T2,进位赋给C;步骤104m=T1n′[0]mod R ∥将T1与n'[0]相乘后,求其对模2k的余数(即取低位),结果赋给m;步骤106(C1,T1)=T1+mn[0]; ∥将T1加上m与n[0]的积,结果的低位赋给T1,高位赋给C1;步骤108for j=1;j<s;j++ ∥令j=1,开始内循环,每次循环j值加1,当j等于s时,退出循环;步骤110(C1,T1)=T2+x[j]y[i]+C1∥将T2与x的第j位与y的第i位的积以及进位C1相加,低位赋给T1,高位赋给C1;步骤112(C,T2)=S[j+1]+C1+C ∥将S的第j+1 for i = 0; i <s; i ++ ∥ order i is 0, but starting cycles each value of i is incremented and if i like {at the time s, i.e. a loop; step 100 (C1, T1) = S [ 0] + x [0] y [i] ∥ S the 0th bit of 0 and x plus y i bits of the product, the lower the result is assigned to Tl, a C1 high assigned; step 102 (C, T2 ) = C1 + S [1] ∥ 1 S bit will add the C1, and assigned to T2, the carry C is assigned; step 104m = T1n '[0] mod R ∥ will T1 and n' [0] is multiplied by after seeking the number of I modulo 2k (i.e., taken low), the result is assigned m; step 106 (C1, T1) = T1 + mn [0]; ∥ T1 is coupled to the m n [0] of the product, the results of the assigned Tl low, high assigned a C1; step 108for j = 1; j <s; j ++ ∥ order j = 1, the start cycles each j value plus 1, when j is equal to s, the loop is exited; step 110 (C1, T1) = T2 + x [j] y [i] + C1∥ T2 and the j-th bit of x and y and the product of the i-th bit carry C1 addition, low Tl assigned, assign high a C1; step 112 (C, T2) = S [j + 1] + C1 + C ∥ the first S j + 1 与C1及C相加,和赋给T2,进位赋给C;步骤114(C1,T1)=T1+mn[j] ∥将T1加上m与n[j]的积,低位赋给T1,高位赋给C1;步骤116S[j-1]=T1; ∥将T1的值赋给S的第j-1位,并结束本次内循环;步骤118(C1,T1)=T2+C1∥将T2加上C1,和赋给T1,进位赋给C1;步骤120S[s-1]=T1∥将T1值赋给S的第s-1位;步骤122S[s]=C1+C ∥将C1加上C,和赋给S的第s位,并结束本次} 外循环;步骤124C=0 ∥给C重新赋零值;步骤124for j=0;j<s;j++ ∥令j=0,开始循环,每次循环j值加1,当j等于s时,退出循环;步骤126(C,y[j])=S[j]-n[j]-C ∥将S的第j位减去n的第j位和借位C,差赋给y的第j位,借位赋给C;步骤128(C,y[s])=S[s]-C ∥将S的第s位减去借位C,差赋给 And C1 and C are added, and assigned to T2, the carry C is assigned; step 114 (C1, T1) = T1 + mn [j] ∥ T1 is coupled to the m n [j] of the product, low T1 assigned, a C1 high assigned; step 116S [j-1] = T1; ∥ value T1 is assigned to the j-1 th bit S and the end of this cycle; step 118 (C1, T1) = T2 + C1∥ the T2 plus C1, and assigned to T1, carry C1 is assigned; step 120S [s-1] = T1∥ T1 value is assigned to the first space S s-1; step 122S [s] = C1 + C ∥ will C1 plus C, and s-bit s is assigned, and the outer end of this cycle}; 124C = 0 ∥ step to re-zero the value C; step 124for j = 0; j <s; j ++ ∥ makes j = 0, start cycles each j value plus 1, when j is equal to s, the loop is exited; step 126 (C, y [j]) = S [j] -n [j] -C ∥ Save the j-th bit of the S the first to n and borrow C j, y j-th bit is assigned to the difference, and assigned to borrow C; step 128 (C, y [s]) = s [s] -C ∥ s-bit s of the first minus borrow C, poor assigned y的第s位,借位赋给C;步骤130if C=0 then return y[i] ∥如果借位C为零,返回y,否则返回S;步骤132else return S[j] The second bit y s, C borrow assigned; step 130if C = 0 then return y [i] ∥ borrow if C is zero, y returns, otherwise S; step 132else return S [j]

本发明的算法将现有CIOS算法的二次内循环改为一个(该循环内还进行了移位运算),步骤112和116中的算式在同一内循环中交替进行,同时增加了步骤114对进位进行处理。 The algorithm of the present invention within the secondary loop prior to a CIOS algorithm (also conducted within the loop shift operations), the steps 112 and 116 in the equation are alternately performed in the same cycle, while increasing the step 114 pairs carry processing. 可以看出,由于这一变化,本发明算法访问系统RAM的次数明显减少。 As can be seen, since this change, the number of times the algorithm of the present invention to access the system RAM is significantly reduced. 本发明算法的运算结果与现有CIOS算法一致,关于这点由于并非本发明的重点,所以不再详细说明,通过对一次外循环运算的推导即可证明。 The results of the calculation algorithm of the present invention is consistent with the prior CIOS algorithm, because the focus is not on this point of the present invention, it is not described in detail, by external circulation of a derivation operation can be demonstrated.

在本发明算法的内循环中,只有3次读操作(x[j]、S[j+1]、n[j])和一次写操作(S[j-1]),不过由于存在5次加法运算,因此完成一次内循环运算的操作步为5步(第一次循环开始时,步骤112的算式中的变量值已经通过在先的步骤取得,这会在下文中会详细说明),完成内循环所需的总的操作步为5(s-2)=5s-10步,比现有CIOS算法少了s+1步,再乘以s-1次外循环,因此而一次模乘运算可以减少的操作步s2-1步(s可为32、64等)。 In a round-robin algorithm of the present invention, only three read operations (x [j], S [j + 1], n [j]) and a write operation (S [j-1]), but the presence of 5 adder, thus completing one cycle of the arithmetic operation step is step 5 (at the beginning of the first cycle, the value of the variable in the equation in step 112 has been made by the previous step, which will be described in detail below), the complete of cycles required for the overall operations of step 5 (s-2) = 5s-10 steps, s + 1 less than in the prior step CIOS algorithm, multiplied by s-1 times the outer loop, and therefore can be a modular multiplication step s2-1 reduction operation step (s 32, 64, etc. may be).

外循环中,本发明算法和现有CIOS算法相比,加、乘及读写操作的总次数略少,两者的所需的操作步基本相同。 An outer loop, the algorithm of the present invention compared to the prior algorithms CIOS, add, multiply the total number of read and write operations, and slightly less, the two steps required for the operation is substantially the same. 而最后的选择运算两者的算法是相同的。 Finally, both the selection algorithm is the same operation. 因此,本发明算法用协处理器实现时,运算速度比原算法明显提高。 Accordingly, when the algorithm of the present invention, a coprocessor, the operation speed than the original algorithm is significantly improved.

图4是本发明实施例模乘算法协处理器的硬件结构图,根据完成的功能,可以将其划分为:数据寄存器、地址运算模块、乘法运算模块、加法运算模块、循环运算模块、逻辑控制模块、内部线路,以及一些特殊功能的模块。 FIG 4 is a configuration diagram of a hardware embodiment of a modular multiplication arithmetic co-processor embodiment of the present invention, upon completion of the function, which can be divided into: the data register, the address calculation module, multiplication module, an adder module, a module calculation cycle, the control logic module, internal wiring, as well as some special function modules.

数据寄存器,用于提供运算数据及存放中间运算结果。 A data register for providing intermediate storage of operation data and operation results. 其中,x[i]寄存器7、y[i]寄存器11为乘数与被乘数寄存器,用来装载多精度的运算数据;T1寄存器8、m寄存器9、n[i]/S[i]寄存器12、T2寄存器48、以及C1寄存器49为中间结果寄存器,用来存放运算的中间结果,也作为运算数据参与运算;K位结果寄存器H 16、K位结果寄存器L 17用于存储乘法运算结果的高位和低位;n′[0]寄存器10为运算常数寄存器;上面提到的寄存器均为K位字长。 Wherein, x [i] register 7, y [i] is a multiplier factor and a multiplicand register 11 registers, for loading multiple-precision arithmetic data; Tl register 8, m register 9, n [i] / S [i] register 12, T2 register 48, register 49, and C1 is the intermediate result register for the intermediate results of an operation, but also as the operation involved in computing data; K-bit result register H 16, K L 17-bit result register for storing the result of the multiplication the high and low; n '[0] is operation constant register 10 is a register; register mentioned above are K-bit word length. Ycb寄存器51、Ycc寄存器52则是一位二进制的寄存器,用来保存加/减法运算结果的进位位;此外,还有一个0常数寄存器,和一个用于暂存需写入系统RAM数据的RAM数据寄存器。 Ycb register 51, Ycc register 52 is a binary register for holding add / carry bit of the subtraction result; in addition, there is a constant register 0, and a RAM for temporarily storing required data writing system RAM data register.

地址运算模块,用于提供对系统RAM读写的地址。 Address arithmetic module for providing read and write addresses of the system RAM. 其中,n[i]基址寄存器24、x[i]基址寄存器25、y[i]基址寄存器26分别用于存放运算数据n、x、y数据的基地址,S[i]基址寄存器28和S[i]基址-1运算寄存器27用来存放运算结果S的基地址和基地址减1后的地址;选择器MUX 29用于选择当前参与运算的基址;地址运算器34用于将选择器MUX 29选择的基址与循环变量一起运算,得到当前地址值;RAM地址寄存器35用于接收地址运算器34的运算结果,作为对系统RAM进行读、写操作时所用的地址。 Wherein, n [i] a base register 24, x [i] a base register 25, y [i] a base register 26 are used to store operational data n, the base address x, y data, S [i] a base address register 28 and S [i] -1-yl address arithmetic register 27 is used to store the base address and the base address arithmetic result S of the decremented address; selector MUX 29 for selecting the base address of the currently participating in operation; address calculator 34 for operation with the base address selected by the selector MUX 29 and the loop variable, to obtain the current address value; the RAM address register 35 to receive an address arithmetic operation result 34, as read on the system RAM, the write address used .

乘法运算模块,用于执行乘法运算。 Multiplication means for performing multiplication. 其中,选择器MUX 13和选择器MUX 14分别与存放有本发明算法中乘法运算所涉及的被乘数x[i]、T1、m和乘数n'[0]、y[i]、n[i]的寄存器相连;K×K位乘法器15用于接收上述选择器的输入,运算结果的高位和低位分别输出到K位结果寄存器H16和K位结果寄存器L17中。 Wherein the selector and the selector MUX 13 and MUX 14 each storing a multiplicand multiplication algorithm of the present invention is related to x [i], T1, m and multiplier n '[0], y [i], n [i] is connected to a register; K K × 15 bit multiplier for receiving the output of said selector inputs, high and low, respectively, to the operation result register K-bit result H16 and L17 K-bit result in register.

加/减法运算模块,用于执行加法运算。 Addition / subtraction means for performing addition operation. 其中,选择器MUX 19(可由两个选择器组成)用于从K位结果寄存器、T1寄存器、T2寄存器、C1寄存器等等数据寄存器中取得进行加法运算的数据;K位加/减法器(带进位位)20用于完成加/减法运算,结果保存到C1、T1、C、T2、y[i]、n[i]/S[i]等数据寄存器中;而K位累加器18用于对加法运算的进位和K位结果寄存器H16的数据进行累加,以完成K字长的数与2K字长的积的加法运算。 Wherein the selector MUX 19 (composed by the two selectors) for acquiring from the K-bit result registers, register Tl, T2 register, like a C1 register data register data addition operations; K-bit add / subtract (with carry bit) 20 for performing addition / subtraction, and stores the result C1, T1, C, T2, y [i], n [i] / S [i] and the like in the data register; the K-bit accumulator 18 is adding data to the K-bit result into the register bit and H16 are accumulated in order to complete the summation of the product of the number of word length K and 2K word length.

逻辑控制模块,用于生成各种控制信号协调整个电路的工作,完成整个模乘协处理器的运算控制和时序控制。 Logic control module, operation of the entire circuit for generating various control signals to coordinate, control and timing control operation to complete the entire modular multiplication coprocessor. 其中,操作步运算器37用于根据当前的操作步寄存器38中的内容以及循环变量比较器36的输出结果生成下一个时钟节拍操作步寄存器38的值;而操作步译码器39则根据当前操作步寄存器38的值通过译码生成各种控制信号协调整个电路的工作。 Wherein the operation step 37 for calculating the value according to the content of the current operation step 38 and the loop variable register output comparator 36 generates a clock pulse of the next operation step of the register 38; step operates according to the current decoder 39 operation step value register 38 generates various control signals to coordinate the work by the decoding circuit. 时钟信号由外部引入。 Introduced by an external clock signal.

循环运算模块,用于对内外循环运算进行计数和比较。 Repeat calculation module for external circulation operation is counted and compared. 其中,循环变量比较器36用于将循环变量运算器31的值与系统配置寄存器46中设定的循环次数比较,反应内/外循环的进程;外部循环计数器A32和内部循环计数器B33用于将计数值与上述的基址寄存器在地址运算器34中作运算,生成运算过程中的多精度运算数据n、x、y在系统RAM中的相应地址和运算结果S在系统RAM中的对应地址;循环变量运算器31用于控制内部和外部循环运算的次数。 Wherein the loop variable for the loop comparator 36 variable calculation values ​​and the system 31 cycles the configuration register 46 set in comparison, the process in the reaction / outer loop; outer loop counter A32 and B33 for the inner loop counter the count value of the above-described base register for address calculation unit 34 in operation, generating a calculation process of multiple-precision arithmetic data n, x, y corresponding addresses and an operation result S in the system RAM corresponding address in the system the RAM; loop variable arithmetic unit 31 for controlling the number of internal and external circulation operation.

内部线路,协处理器内部的线路一方面通过接口与CPU、系统RAM的数据总线和地址总线相连,其中n[i]基址寄存器24、x[i]基址寄存器25、y[i]基址寄存器26、S[i]基址寄存器28的写入受CPU的写入控制信号40、CPU数据4、CPU RD41和CPU地址42控制。 Internal circuits inside the line on the one hand via the coprocessor interface CPU, system RAM address bus and data bus are connected, wherein n [i] a base register 24, x [i] a base register 25, y [i] yl address register 26, the write S [i] of the base address register 28 by the write control signal of CPU 40, CPU data 4, CPU RD41 and CPU 42 controls the address. 而另一方面在协处理器内部,加/减法、乘法的选择器分别与算法中的加数、被加数,以及乘数和被乘数的寄存器通过数据线相连,而加/减法模块(K位加/减法器20和K位累加器18)的输出与T2寄存器48、C1寄存器49、T1寄存器8、m寄存器9、n[i]/S[i]等中间结果寄存器相连,完成相互间的数据传输。 Within the coprocessor on the other hand, addition / subtraction, multiplication algorithm selector and the addend, augend, and the multiplier and multiplicand registers respectively connected via a data line, and the add / subtract module ( K-bit add / subtractor 20 and the K-bit accumulator 18) and the T2 output register 48, C1 register 49, T1 register 8, m register 9, n [i] / S [i] is connected to other intermediate result register, each completion data transfer between. 总之,对应于算法需要进行数据传输的两个部件间均通过内部线路连接在一起。 In summary, for the corresponding lines are connected together by an internal data transmission between the two members is required in the algorithm. 但是加法、乘法、写地址,及读/写RAM操作在同一时间内不使用同一数据线,因而在同一操作步中可以并行运算。 However, addition, multiplication, a write address and a read / write operation of RAM is not used the same data line at the same time, it is possible to parallel arithmetic operation in the same step.

特殊功能模块中,系统使能寄存器47由CPU控制,用来启动和停止一次模乘运算的过程。 Special function modules, the system enable register 47 controls a CPU, to start and stop the course of a modular multiplication.

系统配置寄存器46,用于内外循环的运算次数,对应于不同的运算数据长度,比如机器字长为32时,循环次数设为16或32分别对应于512或1024二进制位的运算数据。 The system configuration register 46, the number of operations for the external circulation of operation corresponding to different data lengths, such as when the machine word size of 32, 16 or 32 cycles to respectively correspond to the computed data 512 or 1024 binary bits. 它与循环变量运算器31的数值比较,为操作步运算器37提供循环的进程信息。 It is compared with the value of the loop variable in the arithmetic unit 31, the operation step is a process operator to provide information about cycle 37. 它的设置使系统可以完成不同长度运算数据的模乘运算,提高本发明协处理器运算的灵活性。 It is provided that the system can complete the modular multiplication operation data of different lengths, the present invention is to improve the flexibility of co-processor operations.

操作步寄存器38中的内容对应的算法时序安排如下所述,其中的序号表示操作步寄存器38中的步骤数,序号后表示在本步骤中由操作步译码器39控制完成的操作。 The operations of step 38 the contents of the register corresponding to the timing schedule algorithm described below, where the number indicates the number of steps in the operation of the step register 38, represents the number in this step is controlled by the operation at step 39 to complete the operation of the decoder. 请同时参照本发明的算法流程,因为操作是根据本发明算法一步一步来进行的。 Referring to the algorithm of the present invention process, because the operation is performed step by step according to the algorithm of the present invention.

外循环开始1、x[0]地址写入RAM地址寄存器35;2、读取系统RAM值(按RAM地址寄存器的地址读取,以下均同)入x[i]寄存器7;y[i]地址写入RAM地址寄存器35;3、读取系统RAM值入y[i]寄存器11;S[0]地址写入系统RAM地址寄存器35;4、执行x[i]寄存器7乘y[i]寄存器11;结果存入K位结果寄存器H16、K位结果寄存器L17;读取系统RAM值入n[i]/S[i]寄存器12;S[1]地址写入RAM地址寄存器35; Outer loop starts 1, x [0] address written in the RAM address register 35; 2, the system reads the value of the RAM (the RAM address read by the address register, the following are the same) into x [i] register 7; y [i] address is written to the RAM address register 35; 3, the value read into system RAM y [i] register 11; S [0] address into system RAM address register 35; 4, performing x [i] register 7 by y [i] a register 11; K-bit result into the result register H16, K-bit result registers L17; read into system RAM value n [i] / S [i] register 12; S [1] addresses written in the RAM address register 35;

5、执行乘法K位结果寄存器H16、K位结果寄存器L17加n[i]/S[i]寄存器12;结果存入C1寄存器49(存高位,下同)、T1寄存器8(存低位,下同);读取系统RAM值入n[i]/S[i]寄存器12;n[0]地址写入RAM地址寄存器35;6、执行C1寄存器49加n[i]/S[i]寄存器12;结果存入Ycb寄存器51(存进位)、T2寄存器48(存和);执行T1寄存器9乘n′[0]寄存器10;结果存于m寄存器9(因为要对结果求对2k的模,所以保留结果的低位);读取RAM值入n[i]/S[i]寄存器12;x[1]地址写入RAM地址寄存器35;7、执行m寄存器9乘n[i]/S[i]寄存器12;结果存入K位结果寄存器H16、K位结果寄存器L17;读取系统RAM值入x[i];8、K位结果寄存器H16、K位结果寄存器L17加T1寄存器8;结果存入C1寄存器49、T1寄存器8;内循环开始,此时内循环内部循环计数器B33的值为19、执行C1寄存器49加T2寄存器48;结果存入Ycc寄存器52(存进位)、T1寄存器8;执行x[i] 5, the K-bit result register performing multiplication H16, L17 plus K-bit result register n [i] / S [i] register 12; C1 is stored in the result register 49 (high memory, the same below), T1 register 8 (kept low, the same); value read into system RAM n [i] / S [i] register 12; n [0] address written in the RAM address register 35; 6, C1 execution register 49 plus n [i] / S [i] register 12; Store result Ycb register 51 (stored binary), T2 register 48 (storage and); performing T1 register 9 by n '[0] register 10; results stored in the m register 9 (due to the modulus of 2k of the results , the low retention results); the read values ​​into RAM n [i] / S [i] register 12; x [1] addresses written in the RAM address register 35; 7, 9 executed by the m register n [i] / S [i] register 12; K-bit result into the result register H16, K-bit result registers L17; read into system RAM values ​​x [i]; 8, K-bit result register H16, K-bit result register L17 plus T1 register 8; C1 stores the result in the register 49, the register Tl 8; the cycle begins, this time the inner loop 19 loop counter value B33, C1 execution register 49 register 48 plus T2; Ycc register 52 stores the result (carry save), T1 register 8; performing x [i] 寄存器7乘y[i]寄存器11;结果存入K位结果寄存器H16、K位结果寄存器L17;S[j+1]地址写入RAM地址寄存器35;10、执行K位结果寄存器H16、K位结果寄存器L17加T1寄存器8;结果存入C1寄存器49、T1寄存器8;读取系统RAM值入n[i]/S[i]寄存器12;n[j]地址写入RAM地址寄存器35;11、执行带进位Ycb寄存器51的C1寄存器49加n[i]/S[i]寄存器12;结果存入Ycb寄存器51、T2寄存器48;读取系统RAM值入n[i]/S[i]寄存器12;x[j+1]地址写入RAM地址寄存器35;12、执行m寄存器9乘n[i]/S[i]寄存器12;结果存入K位结果寄存器H16、K位结果寄存器L17;执行带进位Ycc寄存器52的T2寄存器48加0常数53;结果存入Ycb寄存器51、T2寄存器48;读取系统RAM值入x[i]寄存器7;S[j-1]地址写入RAM地址寄存器35;内循环内部循环计数器B33加1;13、乘法结果K位结果寄存器H16、K位结果寄存器L17加T1寄存器8;结果高位存入C1寄存器49、低位 Register 7 by y [i] register 11; K-bit result into the result register H16, the K-bit result register L17; S [j + 1] address written in the RAM address register 35; 10, performs a K-bit result registers H16, K bits result register L17 plus T1 register 8; C1 stores the result register 49, T1 register 8; value read into system RAM n [i] / S [i] register 12; n [j] address written in the RAM address register 35; 11 , through Carry register C1 Ycb 49 of register 51 plus n [i] / S [i] register 12; Ycb store the result register 51, the register 48 T2; read into system RAM value n [i] / S [i ] register 12; x [j + 1] address written in the RAM address register 35; 12, performed by the m register 9 n [i] / S [i] register 12; K-bit result into the result register H16, the K-bit result register L17; Ycc register T2 through Carry register plus 52 0 48 53 constant; Ycb store the result register 51, T2 register 48; read into system RAM values ​​x [i] register 7; S [j-1] write address into the RAM address register 35; the internal circulation loop counter B33 plus 1; 13, the multiplication result of the K-bit result registers H16, L17 plus K-bit result register T1 register 8; C1 stored in the register 49 results high, low 入T1寄存器8、RAM数据寄存器21并按RAM地址寄存器的地址将数据存入系统RAM(存入操作系统RAM的操作在数据写入RAM数据寄存器后的下一步中进行,放在此处为了说明方便);重复9~13步,直到内部循环计数器B33的值j等于s时,内循环结束14、执行C1寄存器49加T2寄存器48;结果的高位存入C1寄存器49、低位存入T1寄存器8和RAM数据寄存器21;S[j-1]地址写入RAM地址寄存器35;内部循环计数器B33加1(清零);15、执行带进位Ycb寄存器的C1寄存器49加0常数53;结果存入Ycc寄存器52、Ycb寄存器51、RAM数据寄存器21,将RAM数据寄存器的值写入系统RAM,S[j]地址写入RAM地址寄存器35,外部循环计数器A32加1;16、将RAM数据寄存器的值写入系统RAM,并判断跳转;重复外循环,当外部循环计数器A32的值等于s时,结束外循环。 Into the T1 register 8, the address register 21 and press the RAM data RAM address register stores the data in system RAM (operation of the operating system stored in the RAM to write the next data RAM data register, in order to explain here convenient); repeating steps 9-13 until the inner loop counter value j is equal to s B33, the inner loop end 14, a register 49 plus T2 performed C1 register 48; the result is stored in the high register 49 C1, T1 is stored in the register 8 low and RAM data register 21; S [j-1] address written in the RAM address register 35; B33 inner loop counter is incremented (cleared); 15, through Carry C1 Ycb register 0 register 49 plus a constant 53; the result is stored Ycc into the register 52, Ycb 51 is a register, RAM data register 21, the value of the RAM data register into the system RAM, S [j] address written in the RAM address register 35, plus an outer loop counter A32; 16, the RAM data register value is written to system RAM, and judgment jump; outer loop is repeated, when the value is equal to the outer loop counter A32 s, the outer end of the cycle.

减法循环开始,内部循环计数器B33的初始值j为017、n[j]地址写入RAM地址寄存器35;18、读取系统RAM值入n[i]/S[i]寄存器12;S[j]地址写入RAM地址寄存器35;19、n[i]/S[i]寄存器12存入T2寄存器48;读取系统RAM值入n[i]/S[i]寄存器12;y[j]地址写入RAM地址寄存器35;内循环内部循环计数器B33加1;20、执行带进位位Ycb寄存器51的n[i]/S[i]寄存器12减T2寄存器48减法;结果的借位存入Ycb寄存器51、差存入RAM数据寄存器21并在下一步存入系统RAM;重复17~20步,直到内部循环计数器B33的值等于s时,减法循环结束21、S[j](j=s)地址写入RAM地址寄存器35;22、读取系统RAM值入n[i]/S[i]寄存器12;y[j]地址写入RAM地址寄存器35;23、执行带进位位Ycb寄存器51的n[i]/S[I]寄存器12减0;结果存入Ycb寄存器51、RAM数据寄存器21并存入系统RAM;内部循环计数器B33加1。 Subtraction cycle starts, an initial value of the loop counter j inside B33 is 017, n [j] address written in the RAM address register 35; 18, system RAM read value into n [i] / S [i] register 12; S [j ] address written to the RAM address register 35; 19, n [i] / S [i] is stored in T2 register 12 register 48; read into system RAM value n [i] / S [i] register 12; y [j] address is written to the RAM address register 35; the internal circulation loop counter B33 plus 1; 20, n performs the carry bit register Ycb 51 [i] / S [i] register 12 Save register T2 subtractor 48; borrow save results Ycb into the register 51, the difference stored in RAM data register 21 and stored in system RAM in the next step; steps 20 ~ 17 is repeated until the value of the counter B33 is equal to the internal loop s, the cycle is ended subtractor 21, s [j] (j = s ) address written to the RAM address register 35; 22, system RAM read value into n [i] / S [i] register 12; y [j] address written in the RAM address register 35; 23, performing the carry bit register Ycb n 51 of [i] / S [I] 0 Save register 12; Ycb result is stored in register 51, RAM data register 21 and stored in the RAM system; B33 inner loop counter is incremented.

运算结束。 End of operation. 结果保存在系统RAM的S或y中。 Result is stored in system RAM in the S or y.

通过上述步骤可以完成本发明实施例的模乘算法,可见完成一次内循环时确实是用了5步。 The present invention can be accomplished by the above procedure of Example modular multiplication algorithm, when seen within a complete cycle of five steps is indeed.

本发明的协处理器主要应用于智能卡集成电路中,实现多精度模乘算法的快速运算,但也可适用于其它进行大数模乘运算的电路中。 Coprocessor present invention is mainly applied to the smart card integrated circuit, multiple-precision arithmetic fast modular multiplication algorithm, but also applicable to other circuit for modular multiplication of large numbers. 需要说明的是,本发明并不局限于实施例中的具体电路,本领域的技术人员也可在本发明算法的基础上对电路加以变换。 Incidentally, the present invention is not limited to the specific circuit embodiment of the embodiment, those skilled in the art can also be converted to the circuit on the basis of the algorithm of the present invention.

图3中的系统RAM控制信号选择器46并非设置于协处理器的内部,它用于选择系统RAM的地址和控制信号,该选择器的控制由系统使能寄存器47来实现。 System RAM 3 in FIG control signal selector 46 is not disposed inside the coprocessor, for selecting the system RAM address and control signals, controls the selector enable register 47 by the system to achieve. 在算法运算时,系统RAM的控制信号和地址信号由操作步译码器39和RAM地址寄存器35控制;当不进行算法运算的时候,系统RAM的控制信号和地址信号由CPU RD41、CPU RD41、CPU数据4和CPU地址42控制。 When arithmetic operations, control and address signals of the system RAM 35 is controlled by a register operation step 39 and the RAM address decoder; when the arithmetic operation is not performed, the control signals and address signals from the system RAM CPU RD41, CPU RD41, data CPU 4 and CPU 42 controls the address.

当上述实施便的协处理器应用于智能卡电路时,与CPU、系统RAM的连接和图1所示相同,完成一次大数模乘运算的工作流程如图4所述:CPU 1首先将协处理器置于非工作状态,并对系统RAM和协处理器的变量进行初始化。 When the above-described embodiment will be used in smart card coprocessor circuit, the same connection with the CPU and system RAM shown in Figure 1, to complete a modular multiplication of large numbers of the workflow shown in Figure 4: CPU 1 first coprocessor It is in a non-operating state, and the system RAM and the coprocessor variables are initialized. 具体地说,通过将系统使能寄存器47清除,使模乘算法协处理器置于非工作状态,将运算数据x、y、n写入系统RAM 3。 Specifically, the register 47 will enable the system to clear the modular multiplication arithmetic co-processor in a non-operating state, the operation data x, y, n written to system RAM 3. 然后对协处理器2中的寄存器赋初值,即将x、y、n的基地址写入n[i]基址寄存器24、x[i]基址寄存器25、y[i]基址寄存器26,并将运算结果数据的基地址存入S[i]基址寄存器28;将运算常数写入n′[0]寄存器10,将运算数据的长度信息写入系统配置寄存器46;步骤200CPU 1通过将系统使能寄存器47置位,使模乘算法协处理器置于工作状态,协处理器开始进行模乘运算;步骤202CPU 1等待模乘算法协处理运算完毕,判断协处理器中Ycb寄存器的值,如果Ycb的值为零(无借位),运算结果存放在y[i]基址寄存器26为起始地址的系统RAM中;如果Ycb的值为“1”(有借位),则运算结果存放在S[i]基址寄存器28为起始地址的系统RAM中。 Then the initial value register in the coprocessor 2, i.e. x, y, n base address written n [i] a base register 24, x [i] a base register 25, y [i] a base address register 26 the base address and the data stored in the operation result S [i] a base address register 28; the write operation constant n '[0] register 10, the operation data length information writing system configuration register 46; step by 200CPU 1 the system enable register 47 is set, so that modular multiplication arithmetic co-processor operating state is placed, the coprocessor starts modular multiplication; step 202CPU 1 waits for modular multiplication coprocessor arithmetic operation is completed, it is determined in the coprocessor register Ycb value, if the value is zero Ycb (no borrow), the calculation result is stored in y [i] is the base address register 26 in the system RAM start address; Ycb if the value is "1" (borrow), the the operation result is stored in S [i] of the base address register 28 in the system RAM start address. 步骤204要完成一次RSA加密,需要完成的是一次模幂运算,其算法已经在背景技术中列出。 Step 204 to complete a RSA encryption, it is needed to accomplish a modular exponentiation, which algorithms have been listed in the background art. 开始模幂运算时,最初需存入系统RAM的操作数为x、 c和n,假设其超始地址为DZ1、DZ2和DZ3,并在RAM中开辟出存放模乘运算结果S的空间,假设其起始地址为DZ4。 Number of modular exponentiation operation is started, initially stored in system RAM required for x, c and n, assuming that the start address of ultra DZ1, DZ2 and DZ3, and to open up space for the modular multiplication result S in the RAM, it is assumed a starting address of DZ4. 在CPU的控制下,协处理器先要完成MonPro( x, x)的模乘运算,而后又需将运算结果参与到下一次运算中去。 Under the control of the CPU, the coprocessor must first complete MonPro (x, x) of the modular multiplication, and then we need to participate in the computation result to the next calculation. 由于每次模乘运算的乘数和被乘数不同,本发明采用了动态数据地址指针技术,在两次模乘运算之间无需调整数据的存储位置,只需要调整数据的地址指针即可,大大加快了模幂运算的速度,其具体方法如下。 Since each different modular multiplication of the multiplier and multiplicand, the present invention employs the technique of dynamic data address pointer without adjusting the position data stored between the two modular multiplication, only need to adjust the pointer to the address data, greatly accelerates the modular exponentiation, which is specifically as follows.

本发明的智能卡电路在计算MonPro( x, x)时,CPU将乘数x[i],被乘数y[i],S[i],n[i]基址寄存器的基址分别指向x、 x、n和S在系统RAM中的起始地址,即DZ1、DZ1、DZ3、DZ4,其中x[i]和y[i]中存储的地址是相同的。 Smartcard circuitry of the present invention in calculating MonPro (x, x), CPU multiplier x [i], the multiplicand y [i], S [i], n [i] of the base address base register pointing to x , x, n, and S in the system RAM start address, i.e. DZ1, DZ1, DZ3, DZ4, where address x [i] and y [i] are stored in the same. 当协处理完成一次模乘运算后,结果为S或y,S在系统RAM中的起始地址不变,而y则替换了原有x,存放在原来存放x的空间中,其起始地址为DZ1。 When the coprocessor to complete a modular multiplication, the result is y or S, S in the system RAM start address remains unchanged, while the replacement of the original x y, x stored in the storage space of the original, the starting address is DZ1.

如果下一次的模乘运算是MonPro( c, x),此时乘数和被乘数均发生了变化,CPU将x[i]基址寄存器的地址改为指向c的起始地址DZ2,而将y[i]基址寄存器的地址指向上次模乘运算结果(S或y)的起始地址(DZ4或DZ1);如果是y,则需将S对应的RAM空间清零;如果是S,还需将S[i]基址寄存器的地址改为指向y的起始地址DZ1,并将RAM的对应空间清零,用来保存该次模乘算法中的S值。 If the next modular multiplication is MonPro (c, x), this time the multiplier and multiplicand are changed, CPU will be x [i] address base register to point to the starting address c DZ2, and the y [i] point to the address of the base register was last modular multiplication result (S or y) starting address (or DZ1 is DZ4); if y, S need to clear the corresponding RAM space; if S , also needs to S [i] address base register to point to the starting address DZ1 is y, and the corresponding RAM space is cleared, the time to save the value S modular multiplication algorithm.

如果下一次的模乘运算是MonPro( x, x),则需将x[i],y[i]基址寄存器的地址均指向上次模乘运算结果(S或y)的起始地址(DZ4或DZ1);如果是y,则需将S对应的RAM空间清零;如果是S,还需将S[i]基址寄存器的地址改为指向y的起始地址DZ1,并将RAM的对应空间清零,用来保存该次模乘算法中的S值。 If the next modular multiplication is MonPro (x, x), you will need to x [i], the address y [i] point to the base address register of the last modular multiplication result (S or y) starting address ( DZ1 or DZ4); if y, S need to clear the corresponding RAM space; if S, also needs to S [i] address base address register to a starting point y DZ1, and the RAM corresponding to the space is cleared, the time to save the value S modular multiplication algorithm.

按照以上方法,即可以通过改变协处理器中的基址寄存器的值,而实现对乘数和被乘数发生变化后的模乘运算。 According to the above method, i.e., the mold can be achieved for the multiplier and multiplicand value is changed by changing the multiplication coprocessor base address register.

可以理解,本发明的模乘、模幂运算电路虽然在实施例中是作为智能卡电路和协处理器,但其应用并不局限于此。 It will be appreciated, the present invention is a modular multiplication, modular exponentiation arithmetic circuit in the embodiment, although the circuit as a smart card and a coprocessor, but its use is not limited thereto.

Claims (9)

  1. 1.一种加快RSA加解密过程的方法,包括以下步骤:(A)在对要传送明文H进行加密时,根据E=HemodN进行模幂运算,得到密文E,完成加密,其中e、N为加密密钥;(B)在对传送来的密文E进行解密时,根据H=EdmodN进行模幂运算,得到明文H,完成解密,其中d、N为解密密钥;上述模幂运算是分解为多次的蒙格玛丽模乘算法来实现的,其特征在于,所述蒙格玛丽模乘算法通过以下方法实现:设常数R、乘数x、被乘数y、模N都是s位r进制整数,x=xs-1xs-2…x1x0,y=ys-1ys-2…y1y0,n=ns-1ns-2…n1n0;S为s+1位r进制整数,S=SsSs-1…S1S0;r=2k;中间变量C1、T1均为一位r进制数,n'[0]为运算常数,i、j为循环变量,其特征在于,本算法还包括中间变量一位二进制数C和一位r进制数T2,运算前变量S、T1、T2、C1及C均赋零值,其运算步骤如下:(a)令i为0,开始外循环;(b)将S的第0位加上x An accelerating RSA encryption and decryption process, comprising the steps of: (A) at the time of the plaintext to be transmitted is encrypted H, E = HemodN according modulo exponentiation ciphertext E, encrypt, wherein e, N an encryption key; (B) at the time of the transmitted ciphertext decrypts E, H = EdmodN according to modular exponentiation, and expressly H, decryption is completed, where d, N decryption key; and the modular exponentiation is Munger decomposition Mary modular multiplication algorithm is implemented a plurality of times, wherein, said modular multiplication algorithm Colin Mary by the following method: set constant R, the multiplier x, multiplicand y, are modulo-N s r-bit binary integer, x = xs-1xs-2 ... x1x0, y = ys-1ys-2 ... y1y0, n = ns-1ns-2 ... n1n0; S s + 1 to r-bit binary integer, S = SsSs -1 ... S1S0; r = 2k; intermediate variables C1, T1 are a hexadecimal number r, n '[0] is operation constant, i, j is a loop variable, characterized in that the algorithm further comprises an intermediate variable a C-bit binary number and a hexadecimal number r T2, before the operation variable S, T1, T2, C1 and C are assigned a zero value, the calculation steps of: (a) so that i is 0, the outer loop starts; (b) the S bit 0 plus x 0位与y第i位的积,结果的低位赋给T1,高位赋给C1;(c)将C1加上S第1位,和赋给T2,进位赋给C;(d)将T1与n'[0]相乘后,求其对模2k的余数,结果赋给m;(e)将T1加上m与n[0]的积,结果的低位赋给T1,高位赋给C1;(f)令j=1,开始内循环;(g)将T2与x的第j位与y的第i位的积以及进位C1相加,低位赋给T1,高位赋给C1;(h)将S的第j+1位与C1及C相加,和赋给T2,进位赋给C;(i)将T1加上m与n[j]的积,低位赋给T1,高位赋给C1;(j)将T1的值赋给S的第j-1位,循环变量j加1,重复内循环直到j等于s,退出内循环;(k)将T2加上C1,和赋给T1,进位赋给C1;(m)将T1值赋给S的第s-1位;(n)将C1加上C,和赋给S的第s位,循环变量i加1,重复外循环直到i等于s,退出外循环;(o)给C重新赋零值;(p)令j=0,开始循环;(q)将S的第j位减去n的第j位和借位C,差赋给y的第j位,借位赋给C;循环变量j值加1,重复 Y 0 and the product of the i-th bit, the result is assigned to T1 low, high assigned C1; (c) adding the S bit 1 C1, and assigned to T2, assigned to carry C; (d) T1 and the n '[0] by multiplying, seeking the remainder of the 2k mode, the result is assigned to m; (E) and the T1 plus m n [0] of the product, the lower the result is assigned to T1, a C1 assigned high; (f) so that j = 1, the cycle begins; (G) of the i-th bit and the j-th bit x y T2 and the product of the sum and the carry C1, Tl assigned to low, high assigned C1; (H) the first S j + 1 bits of C1 and C are added, and assigned to T2, the carry C is assigned; (I) m and the T1 plus n [j] of the volume, T1 is assigned to the low, high assigned C1 ; (j) the value of T1 is assigned to the first bit S j-1, the loop variable j by 1, repeating the loop until j is equal to s, the loop is exited; (K) plus T2 is C1, and assigned to T1, carry C1 is assigned; (m) the value of T1 is assigned to s s-1 the first bit; (n-) C1 plus the are C, and assigned to the s-th bit s, the loop variable i is incremented by 1, the outer loop is repeated until i is equal to s, exits the outer loop; (O) to re-zero the value C; (P) so that j = 0, the cycle begins; (Q) of the j-th bit S n by subtracting the j-th and borrow C, the difference assigned to the j-th bit of y, is assigned to borrow C; loop variable j value plus 1 repeat 该循环直到j等于s时,退出循环;(r)将S的第s位减去借位C,差赋给y的第s位,借位赋给C;以及(s)如果借位C为零,返回y,否则返回S。 This cycle until j is equal to s, the loop is exited; (R & lt) of the s-bit S minus borrow C, the difference between the s-th bit y is assigned, is assigned to borrow C; and (s) if borrow is C zero return y, otherwise S.
  2. 2.一种实现如权利要求1中蒙格玛丽模乘算法的模乘运算电路,运算字长为K,包括:数据寄存器,用于提供所述算法中加/减法运算和乘法运算的数据及保存运算的中间结果;地址运算模块,用于提供对系统RAM读写的地址,以将系统RAM数据读入数据寄存器,或将数据寄存器的数据写入系统RAM的相应位置;乘法运算模块,用于从数据寄存器中选择进行运算的乘数和被乘数,执行乘法运算,并将运算结果保存到特定的数据寄存器中;加/减法运算模块,用于从数据寄存器中选择进行运算的加数和被加数,执行加法运算,并将运算结果保存在相应的数据寄存器中;逻辑控制模块,用于生成各种控制信号协调整个电路的工作,使其按设定的操作步顺序完成所述算法中的运算步骤,其中加/减法、乘法、读/写及写地址的操作可以在一个操作步中并行完成;循环运 A modular multiplication circuit as implemented in a modular multiplication algorithm Mary Munger claims arithmetic word length is K, comprising: a data register for providing a data Add / subtraction and multiplication of the algorithm and intermediate results of the calculation; the address calculation module, for providing read and write addresses of the system RAM, system RAM in order to read the data into the data register or the data register corresponding to the position of writing the system RAM; multiplication module, with for computing from the data register selection to the multiplier and multiplicand, perform the multiplication, and save the operation results to a specific data register; addition / subtraction module, for selecting operation from the data register addend and the addend, addition operation is performed, the operation result stored in the corresponding data register; logic control module for generating various control signals to coordinate the operation of the entire circuit, so as to complete the setting by the operation step sequence algorithm calculating step, wherein the operating addition / subtraction, multiplication, read / write and the write address can be done in a parallel operation step; loop operation 模块,用于对内外循环运算进行计数,并提供地址运算和循环控制所需的循环进程信息;内部线路,完成协处理器内部部件间的数据传输,并通过接口与CPU、系统RAM的总线相连;以及启停控制模块,由CPU控制,用来启动和停止一次模乘运算的过程。 Means for counting the operation of the external circulation, and provides an address calculation cycle and control cycles required to process information; internal line, data transmission is completed between the internal components of the coprocessor, and is connected via an interface to the CPU, system RAM bus ; and a start-stop control module controlled by the CPU, is used to start and stop the process once a modular multiplication arithmetic.
  3. 3.如权利要求2所述的模乘运算电路,其特征在于,还包括系统配置寄存器,用于存储内外循环运算的循环次数;所述循环运算模块包括循环变量运算器、外部循环计数器、内部循环计数器及循环变量比较器,其中循环变量运算器对外部和内部循环计数器执行加1运算,所述循环变量比较器比较系统配置寄存器和循环变量运算器中的值,提供循环运算的进程信息。 3. The modular multiplication circuit according to claim 2, characterized in that the system further comprises a configuration register for storing the number of internal and external circulation loop operation; the circulation loop comprises a calculation variable calculation module, an external loop counter, internal loop counter and loop variable of the comparator, wherein the loop variable operator to external and internal loop counter 1 performs its operation, the comparator compares the value of the loop variable system configuration register and loop variable computing unit, providing operation cycle process information.
  4. 4.如权利要求2所述的模乘运算电路,其特征在于,所述数据寄存器中变量n和S共用一个K位寄存器,采用了两个一位二进制寄存器保存加法的进/借位位,采用两个专用的K位结果寄存器分别保存乘法结果的高位和低位,还设有一个0常数寄存器和一个用于暂存需写入系统RAM数据的RAM数据寄存器。 4. The modular multiplication circuit according to claim 2, wherein said data register variables n and S share a K-bit register, using a two-bit binary register save adder carry / borrow bit, using two K-bit result of dedicated registers are stored high and low multiplication result, also a constant register 0 and register a data RAM for temporarily storing required data writing system RAM.
  5. 5.如权利要求2所述的模乘运算电路,其特征在于,所述地址运算模块包括n[i]、x[i]、y[i]、S[i]基址寄存器组、选择器、地址运算器及RAM地址寄存器,所述地址运算器将选择器选择的基址与循环变量一起运算,并将运算结果写入RAM地址寄存器中。 5. The modular multiplication circuit according to claim 2, wherein said address calculation module comprises n [i], x [i], y [i], S [i] a base register group, the selector , and the address arithmetic RAM address register, said address arithmetic operator together with the base address selected by the selector loop variable, and the result is written in the RAM address register.
  6. 6.如权利要求4所述的模乘运算电路,其特征在于,所述加/减法运算模块包括选择器、带进位位的K位加/减法器及K位累加器,其中K位累加器用于对加法运算的进位和储存乘法运算结果高位的K位结果寄存器的数据进行累加。 6. The modular multiplication circuit according to claim 4, wherein said addition / subtraction module includes a selector, the carry bit of the K-bit add / subtractor and the K-bit accumulator, wherein the K-bit accumulator data is used to carry adder and storing the multiplication result of the high K-bit result register accumulates.
  7. 7.如权利要求3所述的模乘运算电路,其特征在于,所述逻辑控制模块包括操作步运算器、操作步寄存器及操作步译码器,其中操作步运算器根据当前操作步寄存器中的内容以及循环变量比较器的输出结果生成下一个时钟节拍操作步寄存器的值,操作步译码器则根据当前操作步寄存器的值通过译码生成各种控制信号协调整个电路的工作。 7. The modular multiplication circuit according to claim 3, wherein said control module comprises a logic arithmetic operation step, an operation step and the operation step decoder register, wherein the operating step further arithmetic operation register according to the current the output of the loop variable content, and the comparator generates a clock tick value register operation step, the operation steps of the decoder operation of the entire circuit generating various control signals by decoding the coordinate values ​​according to the current operation of the step register.
  8. 8.一种包括如权利要求2所述模乘运算电路的模幂运算电路,CPU、系统RAM分别与所述模乘运算电路相连;CPU首先将模乘运算电路置于非工作状态,并对系统RAM和模乘运算电路的变量进行初始化,然后CPU使模乘运算电路置于工作状态,由模乘运算电路完成一次模乘运算,然后所述CPU调整模乘运算电路中的乘数、被乘数和结果的基址,使其分别对应于下一次模乘运算的乘数、被乘数和结果在系统RAM中的存储位置,接着进行下一次模乘运算,所述CPU按该方式控制所述模乘运算电路按模幂运算的分解算法完成其中的多次模乘运算后,得到模幂运算结果。 A 2 comprising the modular multiplication circuit modular exponentiation circuit as claimed in claim, CPU, system RAM connected to the modular multiplication circuit, respectively; the CPU first modular multiplication circuit in a non-operating state, and system RAM and a modular multiplication circuit variables are initialized, then the CPU causes the modular multiplication circuit is put into an operational state, to complete a modular multiplication by a modular multiplication circuit, then the CPU modular multiplication adjustment multiplier circuit, is base address and the multiplier results, so storage locations respectively corresponding modular multiplication of a multiplier, multiplicand, and the results in the RAM at the system, followed by a modular multiplication under control of the CPU in this manner after decomposition of the modular multiplication arithmetic operation circuit is completed by multiple modular exponentiation modular multiplication which give the modular exponentiation result.
  9. 9.如权利要求8所述的模幂运算电路,其特征在于,所述模乘运算电路完成一次模乘运算后,CPU根据所述模乘算法中选择运算得到的借位位的值,判断运算结果的存储位置。 9. The modular exponentiation circuit according to claim 8, wherein the modular multiplication circuit to complete a modular multiplication, the CPU borrow bit selection value obtained by the calculation according to the modular multiplication algorithm, is determined operation result memory location.
CN 03156754 2003-09-09 2003-09-09 Method for accelerating RSA encryption/decryption procedure and its analog multiplication and analog power operation circuit CN1259617C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 03156754 CN1259617C (en) 2003-09-09 2003-09-09 Method for accelerating RSA encryption/decryption procedure and its analog multiplication and analog power operation circuit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 03156754 CN1259617C (en) 2003-09-09 2003-09-09 Method for accelerating RSA encryption/decryption procedure and its analog multiplication and analog power operation circuit

Publications (2)

Publication Number Publication Date
CN1492316A true CN1492316A (en) 2004-04-28
CN1259617C true CN1259617C (en) 2006-06-14

Family

ID=34240840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 03156754 CN1259617C (en) 2003-09-09 2003-09-09 Method for accelerating RSA encryption/decryption procedure and its analog multiplication and analog power operation circuit

Country Status (1)

Country Link
CN (1) CN1259617C (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100539496C (en) 2005-11-09 2009-09-09 浙江大学 Method for treating resuing core of RSA code system
US7725624B2 (en) 2005-12-30 2010-05-25 Intel Corporation System and method for cryptography processing units and multiplier
CN100435091C (en) 2006-03-01 2008-11-19 成都卫士通信息产业股份有限公司 Hardware high-density realizing method for great number modules and power system
CN101170406B (en) 2006-10-27 2010-10-06 北京中电华大电子设计有限责任公司 A realization method for calculation coprocessor based on dual core public key password algorithm
CN101631025B (en) 2009-08-07 2012-07-04 彭艳兵 Arithmetic for quickening encryption and decryption of RSA
CN102646033B (en) * 2011-02-21 2015-08-19 中国科学院信息工程研究所 It provides an implementation of the method and apparatus rsa algorithm of encryption and signature functions
JP6105914B2 (en) 2012-12-10 2017-03-29 キヤノン株式会社 COMMUNICATION APPARATUS, CONTROL METHOD, AND PROGRAM
CN102999313B (en) * 2012-12-24 2016-01-20 飞天诚信科技股份有限公司 One kind of Montgomery modular multiplication based on the data processing method
CN103226461B (en) * 2013-03-26 2016-07-06 中山大学 One kind of Montgomery modular multiplication method and circuit for circuit
CN104573544B (en) * 2013-10-28 2017-09-12 上海复旦微电子集团股份有限公司 Guard method and apparatus of data, RSA modular exponentiation methods, devices and circuits
CN104598199B (en) * 2015-01-07 2018-06-01 大唐微电子技术有限公司 Data processing method and system of the kind Montgomery modular multiplier for smart cards
CN104951279B (en) * 2015-05-27 2018-03-20 四川卫士通信息安全平台技术有限公司 A design method based on the quantized NEON engine's Montgomery Modular Multiplication
US9875104B2 (en) * 2016-02-03 2018-01-23 Google Llc Accessing data in multi-dimensional tensors

Also Published As

Publication number Publication date Type
CN1492316A (en) 2004-04-28 application

Similar Documents

Publication Publication Date Title
US6327605B2 (en) Data processor and data processing system
US4156922A (en) Digital system for computation of the values of composite arithmetic expressions
US5220525A (en) Recoded iterative multiplier
US6233597B1 (en) Computing apparatus for double-precision multiplication
US5742530A (en) Compact microelectronic device for performing modular multiplication and exponentiation over large numbers
US6426746B2 (en) Optimization for 3-D graphic transformation using SIMD computations
US4891781A (en) Modulo arithmetic processor chip
US6748410B1 (en) Apparatus and method for modular multiplication and exponentiation based on montgomery multiplication
US6209017B1 (en) High speed digital signal processor
US4754421A (en) Multiple precision multiplication device
US20020013799A1 (en) Accelerated montgomery multiplication using plural multipliers
US5105378A (en) High-radix divider
US5982900A (en) Circuit and system for modulo exponentiation arithmetic and arithmetic method of performing modulo exponentiation arithmetic
US4866652A (en) Floating point unit using combined multiply and ALU functions
US5235536A (en) Absolute difference processor element processing unit, and processor
US5528529A (en) Electronic multiplying and adding apparatus and method
US6317770B1 (en) High speed digital signal processor
US5721697A (en) Performing tree additions via multiplication
US6085210A (en) High-speed modular exponentiator and multiplier
US6622153B1 (en) Virtual parallel multiplier-accumulator
Wu et al. RSA cryptosystem design based on the Chinese remainder theorem
US7158638B2 (en) Encryption circuit
Leong et al. A bit-serial implementation of the international data encryption algorithm IDEA
CN1504890A (en) Address mapping method and system for FFT processor with completely parallel data
Rankine Thomas—a complete single chip RSA device

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model
TR01