Summary of the invention
The technical problem to be solved is to provide a kind of cascade utilized between each functional unit, effectively carries
The high encryption and decryption performance of RSA, and achieve the function of switching between different finite field, the abundant multiplexing support of hardware resource
The fast rsa password coprocessor of dual domain.
The technical solution adopted in the present invention is: a kind of fast rsa password coprocessor supporting dual domain, including:
Territory controls register, for receiving the control signal of outside input;
Control register, for receiving the control signal of outside input;
Ram memory cell, for storing operand and the operation result of outside input;
Binary expands territory, and link field controls the output of register, receives territory and controls the control signal of register;
Prime field, link field controls the output of register, receives territory and controls the control signal of register;
Dual domain modular multiplication unit, connects respectively and controls register, ram memory cell, binary expansion territory and prime field, for basis
The peripheral operation number that ram memory cell is stored by the control signal of territory control register calculates, and result of calculation is stored back to
In ram memory cell.
Described ram memory cell includes the first single port ram memory cell, the second single port ram memory cell and the 3rd
Single port ram memory cell.
Described dual domain modular multiplication unit includes the state machine unit performed for simulation algorithm and for by merging two
Plant the algorithm structure of different finite fields, by unified for the modular multiplication multiply-accumulator unit for a+x*y+b.
Described state machine unit includes correspondence respectively and receives more than the 4th of the operand Xi from ram memory cell output
Road selector, the 7th MUX of operand Yi, first MUX of operand Xi, Tj, operand Ti, Nj's is different
Or the 3rd MUX of door, operand Zi, and be provided with connect respectively described multiply-accumulator unit binary expand territory defeated
Go out Ca memory and the Cb memory of the carry cumulative number holding and storing different time, be connected respectively the first described multichannel
The output of selector, the second MUX and the 3rd MUX for store the X-memory of operand, Y storage
Device and Z memory, wherein, another input of described or door receive outside Inv signal output part connect the second multi-path choice
The input of device, the input of described the first MUX, the second MUX and the 3rd MUX is also distinguished
Connect the prime field output of described multiply-accumulator unit, the 3rd described MUX and the input of the 4th MUX
End is also connected with the output of Ca memory, and the output of described Cb memory connects the 4th MUX and the 5th multichannel respectively
The input of selector, the output of described X-memory, Y-memory and Z memory is connected respectively the 5th multi-path choice
Device, the 6th MUX and the input of the 7th MUX, another input termination of described 5th MUX
Receive numeral 1, described the 4th MUX, the 5th MUX, the 6th MUX and the 7th MUX)
Output respectively constitutes the multiply-accumulator unit described in output connection of state machine unit.
Described multiply-accumulator unit is received the binary of the 64bit of RAM memory cell input respectively by input
Addend a, addend b, multiplier X and multiplier Y, output exports prime field result c and the multiply-accumulator structure of binary expansion field result d respectively
Becoming, described multiply-accumulator includes first adder, second adder, the 3rd adder and multiplier X and multiplier Y to reception
Exporting to the dual domain multiplier of second adder after being multiplied respectively, the input of described first adder receives binary system respectively
Addend a, addend b, output connects second adder and the input of the 3rd adder respectively, described second adder defeated
Going out end output prime field result c, the output output binary of described 3rd adder expands field result d.
Described dual domain multiplier include 64 half be sequentially connected in series add/entirely add array, connect described 64 half add/complete
Add the Wallace tree of the carry output of array, connect the carry output of described wzllace tree and output of suing for peace respectively
Carry propagation adder, wherein, described 64 half add/entirely add first half input adding/entirely adding array of array and receive
The multiplier X and multiplier Y of RAM memory cell input, last output partly adding/entirely adding array connects described entering respectively
The input of adder and described second adder are propagated in position, described in the output connection of described carry propagation adder
3rd adder.
A kind of fast rsa password coprocessor supporting dual domain of the present invention, based on forefathers to RSA modular exponentiation algorithm and illiteracy
On the Research foundation of Montgomery modular multiplication algorithm, combine with preventing side-channel attack method, it is achieved that there is certain anti-side letter
The specialized hardware password accelerating module that road is attacked.Compared to implementations such as general processor, special IC and FPGA,
The present invention has certain advantage in performance and security.Compared to other rsa encryption hardware, it is double that the present invention with the addition of support
The function in territory, extends extra data path, utilizes the cascade between each functional unit, efficiently avoid substantial amounts of redundant digit
According to writing back process, improve the encryption and decryption performance of RSA, it is achieved that the function of switching between different finite fields, and fully multiplexing
Hardware resource, and only supports the crypto module of single domain computing, and area increases less than 20%, and effect is clearly.
Detailed description of the invention
Below in conjunction with embodiment and accompanying drawing, a kind of fast rsa password coprocessor supporting dual domain of the present invention is made in detail
Describe in detail bright.
A kind of fast rsa password coprocessor supporting dual domain of the present invention, have employed Montgomery ladder at mould power layer
Algorithm, uses FIOS algorithm at modular multiplication layer.And by modular multiplication, modulus-power algorithm are carried out comprehensive study and consider, to computing with overall
Middle similar op carries out hardware multiplexing to reduce area;RAM in framework is carried out special connection with data during minimizing mould power
Repeatedly carrying, save data transmission period;Configurable design is carried out so that encryption and decryption supports difference during hardware is realized
The computing of finite field, such that it is able to meet the demand of different user, simultaneously in order to support the finite field of two kinds of uses the longest, design
A kind of efficient 64bit*64bit dual domain multiplier.Secondly by the research to side-channel attack, from initial algorithm research to
In the design process of hardware in later stage, by attack resistance characteristic through among whole design so that hardware design can effectively be prevented
Only power consumption attack and fault attacks, on this basis, is improved the design of hardware modular multiplication module, thus is prevented modular multiplication
Reveal the hidden danger of power consumption.
A kind of fast rsa password coprocessor supporting dual domain of the present invention devises special instruction set, and user passes through
Access reserved interface, the specific instruction of transmission, the finite field of computing can be adjusted dynamically.In order to system can collect easily
Become SoC(System on Chip) on, the present invention uses single port RAM Interface signal and external interconnections, and system owns
Key data and RAM are 64bit bit wide.
As it is shown in figure 1, a kind of fast rsa password coprocessor supporting dual domain of the present invention, including: territory controls register
1, for receiving the control signal of outside input;Control register 2, for receiving the control signal of outside input;RAM stores single
Unit 3, for storing the operand merit output operation result of outside input;Binary expands territory 5, and link field controls the output of register 1
End, receives territory and controls the control signal of register 1;Prime field 6, link field controls the output of register 1, receives territory and controls to post
The control signal of storage 1;Dual domain modular multiplication unit 4, connects respectively and controls register 2, ram memory cell 3, binary expansion territory 5 and element
Number field 6, the peripheral operation number stored ram memory cell 3 for controlling the control signal of register 1 according to territory calculates,
And result of calculation is stored back in ram memory cell 3.Wherein,
Described ram memory cell 3 includes first single port ram memory cell the 31, second single port ram memory cell 32 and
3rd single port ram memory cell 33.Described dual domain modular multiplication unit 4 includes the state machine unit 41 performed for simulation algorithm
With the algorithm structure being used for by merging two kinds of different finite fields, the multiply-accumulator unit being a+x*y+b by modular multiplication unification
42。
State machine unit 41 of the present invention is to have employed Montgomery optimized algorithm FIOS(finely
Integrated operand scanning method) be designed.Multiplier X, Y, N are divided by Montgomery optimized algorithm
The number becoming r bit carries out computing, so realizes hardware very good, can efficiently utilize register.And institute in algorithm
There is the most variable one that turns to of operation to operate, be so beneficial to save hardware resource.Montgomery optimized algorithm includes prime field
Under modular multiplication algorithm and binary expand the modular multiplication algorithm under territory.Wherein,
1, the modular multiplication algorithm under prime field
The algorithm that table 1 is given is the Montgomery algorithm of a kind of Gao Ji, and the operand counted greatly is divided into one piece of block
The word of little bit participates in computing, and what this patent designed is the high base modular multiplier of 64bit bit wide.
Table 1, the FIOS algorithm of prime field
2, the modular multiplication algorithm under binary expands territory
Under binary expands territory, all of data all can be considered polynomial coefficient, and therefore their computing is also converted into many
The algorithm of binomial coefficient, as addition develops into step-by-step modular two addition.Corresponding, when partial product in multiplication is added also according to
The same rule.Table 2 gives the FIOS algorithm supporting that binary expands territory.
Table 2, binary expand the FIOS algorithm in territory
3, the method comparison of not same area
Prime field is substantially the same with the structure of FIOS algorithm under two element field, except addition basic under prime field and two element field,
The difference of multiplying rule, also 2 differences:
3.1, under binary expands territory, the figure place of modulus N usually is out the figure place of multiplier, and usually is out 2bit, such as 256bit
Modular multiplication modulus is 258bit, and the highest order exceeded is 1, then modulus N has more 2bit (value is 0x2) under comparing prime field, therefore exists
The 2bit need to this being had more during the last iteration circulated in the algorithm second layer adds calculating (such as the 6th step in table 2).
3.2, under binary expands territory, computing will not produce carry, and therefore the subtraction of final step necessarily will not be performed, can be straight
Connect and remove.
4, the framework of the modular multiplier of dual domain
By merging the algorithm structure of two kinds of different finite fields, it is a+x*y+b by modular multiplication unification, so contributes to fortune
Calculate the efficient reusable of resource, be greatly saved hardware resource, optimize hardware area.If Fig. 2 is dual domain modular multiplier logic
Structure chart.
Receive defeated from ram memory cell 3 as in figure 2 it is shown, state machine unit 41 of the present invention includes correspondence respectively
4th MUX 415 of the operand Xi gone out, the 7th MUX 418 of operand Yi, the first of operand Xi, Tj
MUX 412, operand Ti, Nj's or door 413, the 3rd MUX 414 of operand Zi, and be provided with respectively
The binary expansion territory output connecting described multiply-accumulator unit 42 the Ca memory 419 of the carry cumulative number storing different time
With Cb memory 4120, be connected respectively first described MUX the 412, second MUX more than 413 and the 3rd
The output of road selector 414 for storing the X-memory 421 of operand, Y-memory 422 and Z memory 4123, wherein,
Another input of described or door 413 receive outside Inv signal output part connect the input of the second MUX 413,
First described MUX the 412, second MUX 413 and the input of the 3rd MUX 414 connect the most respectively
Connect the prime field output of described multiply-accumulator unit 42, the 3rd described MUX 414 and the 4th MUX 415
Input be also connected with the output of Ca memory 419, the output of described Cb memory 4120 connects the 4th multichannel choosing respectively
Select device 415 and the input of the 5th MUX 416, described X-memory 421, Y-memory 4122 and Z memory 4123
Output is connected respectively the defeated of the 5th MUX the 416, the 6th MUX 417 and the 7th MUX 418
Entering end, another input of described 5th MUX 416 receives numeral 1, described the 4th MUX 415, the
The output of five MUX the 416, the 6th MUXs 417 and the 7th MUX 418 respectively constitutes state machine list
The multiply-accumulator unit 42 described in output connection of unit 41.
Reduce the number of times that division occurs in computing, be the effective way improving arithmetic speed.1985, Montgomery
The modular multiplication algorithm proposed quickly substituted for classical mould reduction algorithm, and Montgomery algorithm does not relies on the comparison of lint-long integer and removes
Method, but number is all represented with the remainder of N mould, the modulo operation of N is converted into the division arithmetic to 2 indexes, at hardware
Being exactly shifting function during realization, being a kind of quite convenient for hard-wired algorithm, so being most widely used.
Addition, multiplication basic under prime field and binary expansion territory have significant difference, it is critical only that binary expands the fortune under territory
It is multinomial operation, has the characteristic that will not produce carry compared with traditional computing.Binary expands data under territory and can regard as
Corresponding polynomial coefficient, therefore addition can regard multinomial addition as, according to the rule of homogeneous item addition in multinomial operation, only
The number having same position just can be added, the problem not having carry, and is nodulo-2 addition, thus binary can be expanded territory and add
Method is expressed as data step-by-step xor operation under binary form.Transport owing to multiplication is decomposed into partial product sum
Calculate, therefore can obtain binary by the result isolating xor operation during being added in partial product and expand the multiplication under territory
As a result, the carry add-back that will produce in additive process the most again, i.e. can get ordinary multiplications result.Support 64bit multiply-accumulator
Structure such as Fig. 3, dual domain multiplier principle such as Fig. 4.
As it is shown on figure 3, described multiply-accumulator unit 42 is by the input 64bit that receiver-storage unit 3 inputs respectively
Binary addend a, addend b, multiplier X and multiplier Y, output exports prime field result c and binary respectively and expands field result d
Multiply-accumulator is constituted, and described multiply-accumulator includes first adder 421, second adder the 422, the 3rd adder 423 and
Exporting to the dual domain multiplier 424 of second adder 422 respectively after being multiplied the multiplier X received with multiplier Y, described first adds
The input of musical instruments used in a Buddhist or Taoist mass 421 receives binary addend a, addend b respectively, and output connects second adder 422 and the 3rd respectively
The input of adder 423, output output prime field result c of described second adder 422, described 3rd adder 423
Output output binary expand field result d.
As shown in Figure 4, described dual domain multiplier 424 includes 64 half be sequentially connected in series and adds/entirely add array 4241, even
Connect the Wallace tree 4242 of described 64 half carry output adding/entirely adding array 4241, connect described Wallace tree respectively
The carry output of 4242 and the carry propagation adder 4243 of summation output, wherein, described 64 half add/entirely add array
First half of 4241 adds/entirely adds the multiplier X and multiplier Y of input receiver-storage unit 3 input of array, and last is half years old
The output adding/entirely adding array connects the input of described carry propagation adder 4243 and described second adder respectively
422, the 3rd adder 423 described in output connection of described carry propagation adder 4243.