CN103793199B

CN103793199B - A kind of fast rsa password coprocessor supporting dual domain

Info

Publication number: CN103793199B
Application number: CN201410035727.3A
Authority: CN
Inventors: 郭炜; 刘绪隆; 魏继增
Original assignee: Tianjin University
Current assignee: Phytium Technology Co Ltd
Priority date: 2014-01-24
Filing date: 2014-01-24
Publication date: 2016-09-07
Anticipated expiration: 2034-01-24
Also published as: CN103793199A

Abstract

A fast RSA cryptographic coprocessor supporting dual domains, including: a domain control register for receiving externally input control signals; a control register for receiving externally input control signals; a RAM storage unit for storing externally input Operands and operation results; binary expansion domain, connected to the output terminal of the domain control register, receiving the control signal of the domain control register; prime number domain, connected to the output terminal of the domain control register, receiving the control signal of the domain control register; dual-domain modular multiplication The unit is respectively connected to the control register, the RAM storage unit, the binary expansion domain and the prime number domain, and is used to calculate the external operand stored in the RAM storage unit according to the control signal of the domain control register, and store the calculation result back to the RAM storage unit Inside. The invention effectively avoids a large amount of redundant data write-back process, improves the encryption and decryption performance of RSA, realizes the function of switching between different finite fields, increases the area by less than 20%, and has a very obvious effect.

Description

A Fast RSA Cryptographic Coprocessor Supporting Dual Domains

技术领域technical field

本发明涉及一种RSA密码协处理器。特别是涉及一种支持双域的快速RSA密码协处理器。The invention relates to an RSA cryptographic coprocessor. In particular, it concerns a fast RSA cryptographic coprocessor that supports dual domains.

背景技术Background technique

随着计算机网络和信息技术的发展，信息安全在各领域发挥着越来越重要的作用，其中密码学已经成为信息安全技术的核心。RSA是目前公认的在理论和实际应用中最为成熟和完善的一种公钥密码体制，它是基于大整数因子分解的困难性来确保算法的安全性。目前大多数使用公钥密码进行加密和数字签名都是使用RSA算法。With the development of computer network and information technology, information security is playing an increasingly important role in various fields, among which cryptography has become the core of information security technology. RSA is currently recognized as the most mature and perfect public key cryptosystem in theory and practical application. It is based on the difficulty of factoring large integers to ensure the security of the algorithm. At present, most encryption and digital signatures using public key cryptography use the RSA algorithm.

大数模幂运算是RSA算法的核心运算，它是由一系列大数模乘运算构成，大数的位数需要在数百bit到上千bit，因此运算量非常大，模乘运算是制约其计算速度的瓶颈，解决模乘的速度问题是提高其运算效率的最根本方法。公钥密码是基于有限域的算法，素数域和二元扩域是RSA最常使用的有限域，为了实现快速可配置的RSA算法，设计了系统的运算、存储和控制三大模块以及三个模块间的互联方式。The large number modular exponentiation operation is the core operation of the RSA algorithm. It is composed of a series of large number modular multiplication operations. The number of digits of large numbers needs to be from hundreds to thousands of bits. Therefore, the amount of calculation is very large, and the modular multiplication operation is a constraint. The bottleneck of its computing speed, solving the speed problem of modular multiplication is the most fundamental way to improve its computing efficiency. Public key cryptography is an algorithm based on finite fields. Prime number fields and binary extended fields are the most commonly used finite fields in RSA. In order to realize the fast and configurable RSA algorithm, three modules of operation, storage and control and three modules of the system are designed. Interconnection between modules.

发明内容Contents of the invention

本发明所要解决的技术问题是，提供一种利用各功能单元之间的级联，有效的提高了RSA的加解密性能，并实现了不同有限域之间切换的功能，充分复用了硬件资源的支持双域的快速RSA密码协处理器。The technical problem to be solved by the present invention is to provide a method that utilizes the cascading between functional units to effectively improve the encryption and decryption performance of RSA, realize the function of switching between different finite fields, and fully reuse hardware resources. A fast RSA cryptographic coprocessor with dual domain support.

本发明所采用的技术方案是：一种支持双域的快速RSA密码协处理器，包括：The technical solution adopted in the present invention is: a fast RSA cryptographic coprocessor that supports dual domains, comprising:

域控制寄存器，用于接收外部输入的控制信号；The domain control register is used to receive external input control signals;

控制寄存器，用于接收外部输入的控制信号；The control register is used to receive an externally input control signal;

RAM存储单元，用于存储外部输入的操作数以及运算结果；RAM storage unit, used to store external input operands and operation results;

二元扩域，连接域控制寄存器的输出端，接收域控制寄存器的控制信号；Binary domain expansion, connecting the output terminal of the domain control register, receiving the control signal of the domain control register;

素数域，连接域控制寄存器的输出端，接收域控制寄存器的控制信号；The prime number field is connected to the output end of the field control register and receives the control signal of the field control register;

双域模乘单元，分别连接控制寄存器、RAM存储单元、二元扩域和素数域，用于根据域控制寄存器的控制信号对RAM存储单元存储的外部操作数进行计算，并将计算结果存回到RAM存储单元内。The dual-field modular multiplication unit is connected to the control register, RAM storage unit, binary expansion field and prime number field respectively, and is used to calculate the external operand stored in the RAM storage unit according to the control signal of the domain control register, and store the calculation result back to the RAM storage unit.

所述的RAM存储单元包括有第一单口RAM存储单元、第二单口RAM存储单元和第三单口RAM存储单元。The RAM storage unit includes a first single-port RAM storage unit, a second single-port RAM storage unit and a third single-port RAM storage unit.

所述的双域模乘单元包括有用于模拟算法执行的状态机单元和用于通过融合两种不同有限域的算法结构，将模乘运算统一为a+x*y+b的乘累加器单元。The dual-domain modular multiplication unit includes a state machine unit for simulating algorithm execution and a multiplication accumulator unit for unifying the modular multiplication operation into a+x*y+b by fusing two different finite-field algorithm structures .

所述的状态机单元包括有分别对应接收从RAM存储单元输出的操作数Xi的第四多路选择器、操作数Yi的第七多路选择器、操作数Xi，Tj的第一多路选择器、操作数Ti，Nj的异或门、操作数Zi的第三多路选择器，以及设置有分别连接所述乘累加器单元的二元扩域输出端并存储不同时间的进位累加数的Ca存储器和Cb存储器、分别对应连接所述的第一多路选择器、第二多路选择器和第三多路选择器的输出端的用于存储操作数的X存储器、Y存储器和Z存储器，其中，所述或门的另一输入端接收外部的Inv信号输出端连接第二多路选择器的输入端，所述的第一多路选择器、第二多路选择器和第三多路选择器的输入端还分别连接所述乘累加器单元的素数域输出端，所述的第三多路选择器和第四多路选择器的输入端还连接Ca存储器的输出端，所述Cb存储器的输出端分别连接第四多路选择器和第五多路选择器的输入端，所述X存储器、Y存储器和Z存储器的输出端分别对应连接第五多路选择器、第六多路选择器和第七多路选择器的输入端，所述第五多路选择器的另一个输入端接收数字1，所述的第四多路选择器、第五多路选择器、第六多路选择器和第七多路选择器）的输出端分别构成状态机单元的输出端连接所述的乘累加器单元。The state machine unit includes the fourth multiplexer corresponding to receive the operand Xi output from the RAM storage unit, the seventh multiplexer of the operand Yi, the operand Xi, and the first multiplexer of Tj device, operand Ti, the third multiplexer of the exclusive OR gate of Nj, operand Zi, and is provided with the binary expansion domain output end that is respectively connected with described multiplication accumulator unit and stores the carry accumulation number of different time Ca memory and Cb memory, X memory, Y memory and Z memory for storing operands correspondingly connected to the output ends of the first multiplexer, the second multiplexer and the third multiplexer respectively, Wherein, the other input terminal of the OR gate receives the external Inv signal output terminal and connects the input terminal of the second multiplexer, and the first multiplexer, the second multiplexer and the third multiplexer The input end of the selector is also connected to the prime number field output end of the multiply-accumulator unit, the input end of the third multiplexer and the fourth multiplexer are also connected to the output end of the Ca memory, and the Cb The output terminals of the memory are respectively connected to the input terminals of the fourth multiplexer and the fifth multiplexer, and the output terminals of the X memory, the Y memory and the Z memory are connected to the fifth multiplexer and the sixth multiplexer respectively. selector and the input terminal of the seventh multiplexer, the other input terminal of the fifth multiplexer receives a digital 1, the fourth multiplexer, the fifth multiplexer, the sixth multiplexer The output ends of the road selector and the seventh multiplexer) constitute the state machine unit respectively, and the output ends of the multiplier-accumulator unit are connected.

所述的乘累加器单元由输入端分别接收RAM存储器单元输入的64bit的二进制的加数a、加数b、乘数X和乘数Y，输出端分别输出素数域结果c和二元扩域结果d的乘累加器构成，所述的乘累加器包括有第一加法器、第二加法器、第三加法器和对接收的乘数X和乘数Y相乘后分别输出给第二加法器的双域乘法器，所述的第一加法器的输入端分别接收二进制的加数a、加数b，输出端分别连接第二加法器和第三加法器的输入端，所述第二加法器的输出端输出素数域结果c，所述第三加法器的输出端输出二元扩域结果d。The multiplier-accumulator unit receives the 64-bit binary addend a, addend b, multiplier X and multiplier Y from the input end of the RAM memory unit, and the output end outputs the prime number domain result c and the binary expansion domain respectively. The multiplier-accumulator of the result d is formed, and the multiplier-accumulator includes a first adder, a second adder, a third adder, and multiplies the received multiplier X and multiplier Y to output to the second adder respectively The double-field multiplier of the device, the input end of the first adder receives the binary addend a, the addend b respectively, the output end is respectively connected to the input end of the second adder and the third adder, the second adder The output end of the adder outputs a prime number field result c, and the output end of the third adder outputs a binary extension result d.

所述的双域乘法器包括有依次串接的64个半加/全加阵列，连接所述64个半加/全加阵列的进位输出端的华莱士树，分别连接所述wzllace树的进位输出端和求和输出端的进位传播加法器，其中，所述的64个半加/全加阵列的第一个半加/全加阵列的输入端接收RAM存储器单元输入的乘数X和乘数Y，最后一个半加/全加阵列的输出端分别连接所述的进位传播加法器的输入端和所述的第二加法器，所述的进位传播加法器的输出端连接所述的第三加法器。The double-field multiplier includes 64 half-add/full-add arrays connected in series in sequence, connected to the Wallace tree of the carry output of the 64 half-add/full-add arrays, respectively connected to the carry of the wzllace tree The carry propagation adder of the output terminal and the summation output terminal, wherein, the input terminal of the first half-add/full-add array of the 64 half-add/full-add arrays receives the multiplier X and the multiplier X of the RAM memory unit input Y, the output end of the last half-add/full-add array is respectively connected to the input end of the carry propagation adder and the second adder, and the output end of the carry propagation adder is connected to the third adder.

本发明的一种支持双域的快速RSA密码协处理器，基于前人对RSA模幂算法以及蒙哥马利模乘算法的研究基础之上，与抗侧信道攻击方法相结合，实现了具有一定的抗侧信道攻击的专用硬件密码加速模块。相比于通用处理器、专用集成电路以及FPGA等实现方式，本发明在性能及安全性上都有一定的优势。相比于其他RSA加密硬件，本发明添加了支持双域的功能，扩展额外的数据通路，利用各功能单元之间的级联，有效地避免了大量的冗余数据写回过程，提高了RSA的加解密性能，实现了不同有限域之间切换的功能，并充分复用了硬件资源，与仅支持单域运算的密码模块，面积增加不到20%，效果非常明显。A fast RSA cryptographic coprocessor supporting dual domains of the present invention is based on predecessors' research on RSA modular exponentiation algorithm and Montgomery modular multiplication algorithm, combined with anti-side channel attack methods, and achieves a certain anti- Dedicated hardware cryptographic acceleration module for side channel attacks. Compared with general processors, application-specific integrated circuits, and FPGAs, the present invention has certain advantages in performance and safety. Compared with other RSA encryption hardware, the present invention adds the function of supporting dual domains, expands additional data paths, utilizes cascading between functional units, effectively avoids a large amount of redundant data write-back process, and improves RSA The excellent encryption and decryption performance realizes the function of switching between different finite fields, and fully reuses hardware resources. Compared with the cryptographic module that only supports single-field operations, the area is increased by less than 20%, and the effect is very obvious.

附图说明Description of drawings

图1是本发明的整体构成框图；Fig. 1 is the overall structure block diagram of the present invention;

图2是本发明中双域模乘单元逻辑结构图；Fig. 2 is a logical structural diagram of a dual-domain modular multiplication unit in the present invention;

图3是本发明中双域乘累加器逻辑结构图；Fig. 3 is the logical structural diagram of double-field multiplication accumulator among the present invention;

图4是本发明中双域乘法器的原理图。FIG. 4 is a schematic diagram of a dual-field multiplier in the present invention.

图中in the picture

1：域控制寄存器 2：控制寄存器1: Domain Control Register 2: Control Register

3：RAM存储单元 4：双域模乘单元3: RAM storage unit 4: Dual domain modular multiplication unit

5：二元扩域 6：素数域5: Binary extended field 6: Prime number field

具体实施方式detailed description

下面结合实施例和附图对本发明的一种支持双域的快速RSA密码协处理器做出详细说明。A fast RSA cryptographic coprocessor supporting dual domains of the present invention will be described in detail below in combination with embodiments and drawings.

本发明的一种支持双域的快速RSA密码协处理器，在模幂层采用了蒙哥马利阶梯算法，在模乘层使用FIOS算法。并通过对模乘、模幂算法进行综合研究与整体考虑，对运算中相似运算进行硬件复用以减小面积；对架构中RAM进行特殊连接以减少模幂过程中数据的多次搬运，节省数据传输时间；对硬件实现过程中进行可配置设计，使得加解密支持不同有限域的运算，从而可以满足不同用户的需求，同时为了支持两种最长使用的有限域，设计了一种高效64bit*64bit双域乘法器。其次通过对侧信道攻击的研究，从最初的算法研究到后期的硬件设计过程中，将抗攻击特性贯穿于整个设计之中，使得硬件设计可以有效的防止功耗攻击和故障攻击，在此基础之上，对硬件模乘模块设计进行了改进，从而防止了模乘泄露功耗的隐患。A fast RSA cryptographic coprocessor supporting dual domains of the present invention adopts the Montgomery ladder algorithm in the modular exponentiation layer, and uses the FIOS algorithm in the modular multiplication layer. And through the comprehensive research and overall consideration of the modular multiplication and modular exponentiation algorithms, the hardware multiplexing of similar operations in the operation is carried out to reduce the area; the RAM in the architecture is specially connected to reduce the multiple handling of data in the process of modular exponentiation, saving Data transmission time; configurable design in the hardware implementation process, so that encryption and decryption support operations of different finite fields, so as to meet the needs of different users. At the same time, in order to support the two longest-used finite fields, an efficient 64bit *64bit dual domain multiplier. Secondly, through the research on side-channel attacks, from the initial algorithm research to the later hardware design process, anti-attack features are integrated throughout the design, so that the hardware design can effectively prevent power consumption attacks and fault attacks. Based on this Above, the design of the hardware modular multiplication module has been improved, thus preventing the hidden danger of power consumption leakage by the modular multiplication.

本发明的一种支持双域的快速RSA密码协处理器设计了特殊的指令集，用户通过访问预留的接口，传输特定的指令，可以动态的调整运算的有限域。为了系统能够方便的集成于SoC（System on Chip）之上，本发明采用的是单口RAM接口信号与外部互联，系统所有主要数据以及RAM均为64bit位宽。A fast RSA cryptographic coprocessor supporting dual fields of the present invention designs a special instruction set, and the user can dynamically adjust the finite field of operation by accessing the reserved interface and transmitting specific instructions. In order for the system to be easily integrated on the SoC (System on Chip), the present invention uses a single-port RAM interface signal to interconnect with the outside, and all main data and RAM of the system are 64bit wide.

如图1所示，本发明的一种支持双域的快速RSA密码协处理器，包括：域控制寄存器1，用于接收外部输入的控制信号；控制寄存器2，用于接收外部输入的控制信号；RAM存储单元3，用于存储外部输入的操作数功输出运算结果；二元扩域5，连接域控制寄存器1的输出端，接收域控制寄存器1的控制信号；素数域6，连接域控制寄存器1的输出端，接收域控制寄存器1的控制信号；双域模乘单元4，分别连接控制寄存器2、RAM存储单元3、二元扩域5和素数域6，用于根据域控制寄存器1的控制信号对RAM存储单元3存储的外部操作数进行计算，并将计算结果存回到RAM存储单元3内。其中，As shown in Figure 1, a kind of fast RSA cryptographic coprocessor that supports dual domains of the present invention comprises: domain control register 1, is used to receive the control signal of external input; Control register 2, is used to receive the control signal of external input ; The RAM storage unit 3 is used to store the operand output operation result of the external input; the binary expansion domain 5 is connected to the output terminal of the domain control register 1, and receives the control signal of the domain control register 1; the prime number domain 6 is connected to the domain control The output terminal of the register 1 receives the control signal of the domain control register 1; the dual-domain modular multiplication unit 4 is respectively connected to the control register 2, the RAM storage unit 3, the binary expansion domain 5 and the prime number domain 6, and is used to control the register 1 according to the domain. The external operand stored in the RAM storage unit 3 is calculated by the control signal, and the calculation result is stored back in the RAM storage unit 3 . in,

所述的RAM存储单元3包括有第一单口RAM存储单元31、第二单口RAM存储单元32和第三单口RAM存储单元33。所述的双域模乘单元4包括有用于模拟算法执行的状态机单元41和用于通过融合两种不同有限域的算法结构，将模乘运算统一为a+x*y+b的乘累加器单元42。The RAM storage unit 3 includes a first single-port RAM storage unit 31 , a second single-port RAM storage unit 32 and a third single-port RAM storage unit 33 . The dual-field modular multiplication unit 4 includes a state machine unit 41 for simulating algorithm execution and is used to unify the modular multiplication operation into a multiplication-accumulation of a+x*y+b by fusing two different finite-field algorithm structures device unit 42.

本发明所述的状态机单元41是采用了蒙哥马利优化算法FIOS（finelyintegrated operand scanning method）进行设计的。蒙哥马利优化算法中将乘数X、Y、N分成r bit的数进行运算，这样对硬件实现非常有好处，可以高效利用寄存器。而且算法中所有操作均可变化为一种操作，这样将有利于节省硬件资源。蒙哥马利优化算法包括有素域下的模乘算法和二元扩域下的模乘算法。其中，The state machine unit 41 of the present invention is designed using the Montgomery optimization algorithm FIOS (finely integrated operand scanning method). In the Montgomery optimization algorithm, the multipliers X, Y, and N are divided into r-bit numbers for operation, which is very beneficial to hardware implementation and can efficiently utilize registers. Moreover, all operations in the algorithm can be changed into one operation, which will help save hardware resources. The Montgomery optimization algorithm includes the modular multiplication algorithm under the prime field and the modular multiplication algorithm under the binary extended field. in,

1、素域下的模乘算法1. Modular multiplication algorithm under prime field

表1给出的算法是一种高基的蒙哥马利模乘算法，将大数的操作数划分为一块块小bit的字参与运算，本专利设计的是64bit位宽的高基模乘器。The algorithm given in Table 1 is a high-radix Montgomery modular multiplication algorithm, which divides the operands of large numbers into small-bit words to participate in the operation. This patent designs a high-radix modular multiplier with a 64-bit bit width.

表1、素域的FIOS算法Table 1. FIOS algorithm of prime field

2、二元扩域下的模乘算法2. Modular multiplication algorithm under binary expansion field

在二元扩域下，所有的数据均可视为多项式的系数，因此他们的运算也转换为多项式系数的运算法则，如加法演变为按位模二加法。对应的，乘法中的部分积相加时也按照一样的规则。表2给出了支持二元扩域的FIOS算法。Under the binary expansion domain, all data can be regarded as coefficients of polynomials, so their operations are also converted into arithmetic rules of polynomial coefficients, such as addition evolves into bitwise modulo two addition. Correspondingly, the same rule applies when adding partial products in multiplication. Table 2 shows the FIOS algorithms that support binary domain expansion.

表2、二元扩域的FIOS算法Table 2. FIOS algorithm for binary domain expansion

3、不同域的算法比较3. Comparison of algorithms in different domains

素域和二元域下FIOS算法的结构基本上相同，除了素域和二元域下基本的加法、乘法运算法则的区别，还有两点区别：The structure of the FIOS algorithm under the prime field and the binary field is basically the same, except for the difference between the basic addition and multiplication algorithms under the prime field and the binary field, there are two differences:

3.1、二元扩域下模数N的位数通常超出乘数的位数，且通常超出2bit，如256bit的模乘模数为258bit，且超出的最高位为1，则模数N相较素域下多出2bit(值为0x2)，因此在算法第二层内循环的最后一次迭代时需将这多出的2bit加入计算（如表2中第6步）。3.1. The number of digits of the modulus N under binary expansion usually exceeds the number of digits of the multiplier, and usually exceeds 2 bits. For example, the modulus of a 256-bit multiplication modulus is 258 bits, and the highest bit exceeded is 1, then the modulus N is compared There are 2 bits extra in the prime field (the value is 0x2), so the extra 2 bits need to be added to the calculation in the last iteration of the inner loop of the second layer of the algorithm (such as step 6 in Table 2).

3.2、二元扩域下运算不会产生进位，因此最后一步的减法一定不会被执行，可直接去掉。3.2. The binary expansion operation will not generate a carry, so the subtraction in the last step will not be executed and can be removed directly.

4、双域的模乘器的架构4. The architecture of the dual-domain modular multiplier

通过融合两种不同有限域的算法结构，将模乘运算统一为a+x*y+b，这样有助于运算资源的高效可复用，极大的节省了硬件资源，优化了硬件面积。如图2为双域模乘器逻辑结构图。By fusing two different finite field algorithm structures, the modular multiplication operation is unified as a+x*y+b, which helps the efficient and reusable operation resources, greatly saves hardware resources, and optimizes the hardware area. Figure 2 is a logical structure diagram of the dual-domain modular multiplier.

如图2所示，本发明所述的状态机单元41包括有分别对应接收从RAM存储单元3输出的操作数Xi的第四多路选择器415、操作数Yi的第七多路选择器418、操作数Xi，Tj的第一多路选择器412、操作数Ti，Nj的或门413、操作数Zi的第三多路选择器414，以及设置有分别连接所述乘累加器单元42的二元扩域输出端并存储不同时间的进位累加数的Ca存储器419和Cb存储器4120、分别对应连接所述的第一多路选择器412、第二多路选择器413和第三多路选择器414的输出端的用于存储操作数的X存储器421、Y存储器422和Z存储器4123，其中，所述或门413的另一输入端接收外部的Inv信号输出端连接第二多路选择器413的输入端，所述的第一多路选择器412、第二多路选择器413和第三多路选择器414的输入端还分别连接所述乘累加器单元42的素数域输出端，所述的第三多路选择器414和第四多路选择器415的输入端还连接Ca存储器419的输出端，所述Cb存储器4120的输出端分别连接第四多路选择器415和第五多路选择器416的输入端，所述X存储器421、Y存储器4122和Z存储器4123的输出端分别对应连接第五多路选择器416、第六多路选择器417和第七多路选择器418的输入端，所述第五多路选择器416的另一个输入端接收数字1，所述的第四多路选择器415、第五多路选择器416、第六多路选择器417和第七多路选择器418的输出端分别构成状态机单元41的输出端连接所述的乘累加器单元42。As shown in Figure 2, the state machine unit 41 of the present invention includes the fourth multiplexer 415 corresponding to the operand Xi output from the RAM storage unit 3, and the seventh multiplexer 418 of the operand Yi. , operand Xi, the first multiplexer 412 of Tj, operand Ti, the 3rd multiplexer 414 of the OR gate 413 of Nj, operand Zi, and be provided with and be respectively connected with described multiplication accumulator unit 42 The Ca memory 419 and the Cb memory 4120 of the binary expansion output and storing the carry accumulation numbers at different times are connected to the first multiplexer 412, the second multiplexer 413 and the third multiplexer respectively. X memory 421, Y memory 422 and Z memory 4123 for storing the operand at the output end of the device 414, wherein, the other input end of the OR gate 413 receives the external Inv signal output end and connects the second multiplexer 413 The input terminals of the first multiplexer 412, the second multiplexer 413 and the third multiplexer 414 are also respectively connected to the prime field output terminals of the multiply-accumulator unit 42, so The input ends of the third multiplexer 414 and the fourth multiplexer 415 are also connected to the output end of the Ca memory 419, and the output ends of the Cb memory 4120 are respectively connected to the fourth multiplexer 415 and the fifth multiplexer 415. The input terminal of the way selector 416, the output terminals of the X memory 421, the Y memory 4122 and the Z memory 4123 are respectively connected to the fifth multiplexer 416, the sixth multiplexer 417 and the seventh multiplexer 418 The other input terminal of the fifth multiplexer 416 receives a digital 1, the fourth multiplexer 415, the fifth multiplexer 416, the sixth multiplexer 417 and the fourth multiplexer 417 The output ends of the seven multiplexers 418 constitute the state machine unit 41 respectively, and the output ends of the multiplier-accumulator unit 42 are connected.

减少除法在运算中出现的次数，是提高运算速度的有效途径。1985年，Montgomery提出的模乘算法很快替换了经典模约减算法，蒙哥马利算法不依赖于长整数的比较和除法，而是把数都用N模的余数来表示，把对N的取模运算转化为对2指数的除法运算，在硬件实现过程中就是移位操作，是一种十分便于硬件实现的算法，所以应用最为广泛。Reducing the number of times that division occurs in operations is an effective way to increase the speed of operations. In 1985, the modular multiplication algorithm proposed by Montgomery quickly replaced the classic modular reduction algorithm. The Montgomery algorithm does not rely on the comparison and division of long integers, but uses the remainder of the N module to represent the number. The operation is transformed into the division operation of the 2 exponent, which is a shift operation in the hardware implementation process. It is an algorithm that is very convenient for hardware implementation, so it is the most widely used.

素域和二元扩域下基本的加法、乘法具有显著的区别，关键在于二元扩域下的运算是多项式运算，与传统的运算相比有不会产生进位的特性。二元扩域下数据可以看成是对应多项式的系数，因此加法可看成多项式相加，按照多项式运算中同次项相加的法则，只有相同位置的数才会相加，不会有进位的问题，而且是模2加法，这样就可以将二元扩域加法表示为数据在二进制形式下的按位异或操作。由于乘法可分解成部分积之和来进行运算，因此可以通过在部分积相加的过程中分离出异或操作的结果来得到二元扩域下的乘法结果，然后再将加法过程中产生的进位加回，即可得到普通乘法结果。支持64bit乘累加器结构如图3，双域乘法器原理如图4。The basic addition and multiplication under the prime field and the binary extended field are significantly different. The key is that the operation under the binary extended field is a polynomial operation, which has the characteristic that no carry will be generated compared with the traditional operation. The data under the binary expansion domain can be regarded as the coefficients of the corresponding polynomials, so the addition can be regarded as the addition of polynomials. According to the rule of adding items of the same order in polynomial operations, only numbers in the same position will be added without carry. The problem, and it is a modulo 2 addition, so that the binary extended field addition can be expressed as a bitwise XOR operation of data in binary form. Since the multiplication can be decomposed into the sum of partial products for operation, the result of the multiplication under the binary expansion field can be obtained by separating the result of the XOR operation in the process of adding the partial products, and then the multiplication result generated during the addition process The carry is added back, and the ordinary multiplication result is obtained. The structure of the 64-bit multiply-accumulator is shown in Figure 3, and the principle of the dual-field multiplier is shown in Figure 4.

如图3所示，所述的乘累加器单元42由输入端分别接收存储器单元3输入的64bit的二进制的加数a、加数b、乘数X和乘数Y，输出端分别输出素数域结果c和二元扩域结果d的乘累加器构成，所述的乘累加器包括有第一加法器421、第二加法器422、第三加法器423和对接收的乘数X和乘数Y相乘后分别输出给第二加法器422的双域乘法器424，所述的第一加法器421的输入端分别接收二进制的加数a、加数b，输出端分别连接第二加法器422和第三加法器423的输入端，所述第二加法器422的输出端输出素数域结果c，所述第三加法器423的输出端输出二元扩域结果d。As shown in FIG. 3 , the multiplier-accumulator unit 42 receives the binary addend a, addend b, multiplier X and multiplier Y of the 64-bit input from the memory unit 3 at the input end, and the output end outputs the prime number field respectively The multiplication accumulator of the result c and the binary expansion result d constitutes, and the multiplication accumulator includes a first adder 421, a second adder 422, a third adder 423 and the received multiplier X and the multiplier After Y is multiplied, output to the dual-field multiplier 424 of the second adder 422 respectively, the input ends of the first adder 421 respectively receive the binary addend a, the addend b, and the output ends are respectively connected to the second adder 422 and the input terminal of the third adder 423, the output terminal of the second adder 422 outputs the prime field result c, and the output terminal of the third adder 423 outputs the binary extension result d.

如图4所示，所述的双域乘法器424包括有依次串接的64个半加/全加阵列4241，连接所述64个半加/全加阵列4241的进位输出端的华莱士树4242，分别连接所述华莱士树4242的进位输出端和求和输出端的进位传播加法器4243，其中，所述的64个半加/全加阵列4241的第一个半加/全加阵列的输入端接收存储器单元3输入的乘数X和乘数Y，最后一个半加/全加阵列的输出端分别连接所述的进位传播加法器4243的输入端和所述的第二加法器422，所述的进位传播加法器4243的输出端连接所述的第三加法器423。As shown in Fig. 4, described double-field multiplier 424 includes 64 half-add/full-add arrays 4241 connected in series in sequence, and connects the Wallace tree of the carry output end of described 64 half-add/full-add arrays 4241 4242, the carry-propagation adder 4243 connected to the carry output terminal and the sum output terminal of the Wallace tree 4242 respectively, wherein the first half-add/full-add array of the 64 half-add/full-add arrays 4241 The input terminal receives the multiplier X and the multiplier Y input by the memory unit 3, and the output terminal of the last half-add/full-add array is respectively connected to the input terminal of the carry propagation adder 4243 and the second adder 422 , the output end of the carry propagation adder 4243 is connected to the third adder 423 .

Claims

1. A fast RSA cryptographic coprocessor supporting dual domains, comprising:

a domain control register (1) for receiving an externally input control signal;

the control register (2) is used for receiving an externally input control signal;

a RAM storage unit (3) for storing operands inputted from outside and operation results;

the binary domain expansion (5) is connected with the output end of the domain control register (1) and receives a control signal of the domain control register (1);

the prime number domain (6) is connected with the output end of the domain control register (1) and receives a control signal of the domain control register (1);

the double-domain modular multiplication unit (4) is respectively connected with the control register (2), the RAM storage unit (3), the binary extension domain (5) and the prime number domain (6), and is used for calculating external operands stored in the RAM storage unit (3) according to control signals of the domain control register (1) and storing the calculation results back into the RAM storage unit (3);

the double-domain modular multiplication unit (4) comprises a state machine unit (41) used for simulating algorithm execution and a multiplication accumulator unit (42) used for unifying modular multiplication operation into a + x y + b by fusing two algorithm structures of different finite domains; the state machine unit (41) comprises a fourth multiplexer (415) which respectively correspondingly receives an operand Xi output from the RAM storage unit (3), a seventh multiplexer (418) of the operand Yi, a first multiplexer (412) of the operands Xi, Tj, an exclusive-OR gate (413) of the operands Ti, Nj, a third multiplexer (414) of the operand Zi, a Ca memory (419) and a Cb memory (4120) which are respectively connected with the binary extension domain output end of the multiplier accumulator unit (42) and store carry accumulation numbers at different times, an X memory (421) which respectively correspondingly connects the output ends of the first multiplexer (412), the second multiplexer (413) and the third multiplexer (414) and is used for storing the operands, a Y memory (4122) and a Z memory (4123), wherein the other input end of the OR gate (413) receives an external Inv signal and the output end of the OR gate is connected with the output end of the second multiplexer (413) The input end of the first multiplexer (412), the input end of the second multiplexer (413) and the input end of the third multiplexer (414) are respectively connected with the prime field output end of the multiply accumulator unit (42), the input ends of the third multiplexer (414) and the fourth multiplexer (415) are respectively connected with the output end of a Ca memory (419), the output end of the Cb memory (4120) is respectively connected with the input ends of the fourth multiplexer (415) and the fifth multiplexer (416), the output ends of the X memory (421), the Y memory (4122) and the Z memory (4123) are respectively connected with the input ends of the fifth multiplexer (416), the sixth multiplexer (417) and the seventh multiplexer (418), the other input end of the fifth multiplexer (416) receives a number 1, and the fourth multiplexer (415), The output ends of the fifth multiplexer (416), the sixth multiplexer (417) and the seventh multiplexer (418) respectively form the output end of the state machine unit (41) and are connected with the multiplying and accumulating unit (42).

2. A fast RSA cryptographic coprocessor supporting dual domains as claimed in claim 1, characterized in that said RAM memory unit (3) comprises a first single-port RAM memory unit (31), a second single-port RAM memory unit (32) and a third single-port RAM memory unit (33).

3. The fast RSA cryptographic coprocessor supporting dual domains as claimed in claim 1, wherein the multiplier-accumulator unit (42) is composed of a multiplier-accumulator whose input end receives the 64-bit binary addend a, addend b, multiplier X and multiplier Y respectively input by the RAM memory unit (3), and whose output end outputs the prime domain result c and the binary extended domain result d respectively, the multiplier-accumulator includes a first adder (421), a second adder (422), a third adder (423), and a dual-domain multiplier (424) which multiplies the received multiplier X and multiplier Y and outputs the result to the second adder (422), the input end of the first adder (421) receives the binary addend a and addend b respectively, the output end connects the input ends of the second adder (422) and the third adder (423), the output end of the second adder (422) outputs the prime domain result c, an output of the third adder (423) outputs a binary extension result d.

4. A dual domain capable fast RSA cryptographic coprocessor as claimed in claim 1, the double-domain multiplier (424) is characterized by comprising 64 half-adding/full-adding arrays (4241) which are sequentially connected in series, wzllace (4242) connected with the carry output ends of the 64 half-adding/full-adding arrays (4241), and a carry propagation adder (4243) respectively connected with the carry output end and the summation output end of a Wallace tree (4242), wherein, the input end of the first half-adding/full-adding array of the 64 half-adding/full-adding arrays (4241) receives a multiplier X and a multiplier Y input by a RAM memory unit (3), the output end of the last half-adding/full-adding array is respectively connected with the input end of the carry propagation adder (4243) and the second adder (422), the output end of the carry propagation adder (4243) is connected with the third adder (423).