CN100573540C

CN100573540C - A kind of method for designing of asynchronous block cipher algorithm coprocessor

Info

Publication number: CN100573540C
Application number: CNB200810143205XA
Authority: CN
Inventors: 王志英; 童元满; 陆洪毅; 任江春; 王蕾; 戴葵; 龚锐; 石伟; 阮坚; 李勇
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2008-09-16
Filing date: 2008-09-16
Publication date: 2009-12-23
Anticipated expiration: 2028-09-16
Also published as: CN101350038A

Abstract

The invention discloses a kind of method for designing of asynchronous block cipher algorithm coprocessor, the technical matters that solve provides a kind of method for designing of asynchronous block cipher algorithm coprocessor.Technical scheme is with each takes turns iteration as submodule independently in the block cipher; Adopt HDL to design each submodule and each submodule is carried out logic synthesis, obtain static single track net table; With static single track net table only be converted to by two inputs of complementation and door and or the compound logic net table formed of door; To each submodule coupling of delaying time, increase and the identical time-delay matching module of submodule time-delay, and guarantee that the input signal of each submodule is identical to the time-delay of output signal, and arbitrarily with door with or input identical time of arrival; Each submodule is linked in sequence, obtains complete net table; Carry out rear end placement-and-routing, obtain the GDS domain.Adopt the coprocessor of this method design to have higher anti-power consumption attack protective capacities, have high operational performance and low-power consumption characteristic simultaneously.

Description

A kind of method for designing of asynchronous block cipher algorithm coprocessor

Technical field

The present invention relates to a kind of method for designing of microprocessor, especially a kind of method for designing of cipher code arithmetic assisting processor.

Background technology

The security of cryptographic algorithm comprises two aspects, the one, and the security on the cryptographic algorithm mathematical meaning, the 2nd, the security on cryptographic algorithm is realized.Traditional cryptanalysis is exactly the means of implementing at cryptographic algorithm itself that crack, and such as difference and linear cryptanalysis, conventional cryptanalysis and Brute Force are invalid for the cryptographic algorithm of existing widespread use.Power consumption attack is a kind of effective means that cracks key of utilizing the weak link in the cryptographic algorithm specific implementation to implement, is to have the attack means of high security threat in the bypass attack.Because there is the correlativity on the statistical significance between the private cipher key in the safety chip and the cryptographic algorithm operation time institute's power consumed, the assailant on the basis of gathering a large amount of power consumption samples, the utilization mathematical statistics method interior private cipher key of slice of can deriving.From the angle of power consumption attack, power consumption attack can be divided three classes: simple power consumption attack (Simple Power Analysis:SPA), differential power consumption are attacked (Differential Power Analysis:DPA) and high-order power consumption attack (High Order DPA:HODPA).The required cost of power consumption attack is little, data encryption standards), AES (Advanced Encryption Standard: Advanced Encryption Standard), RSA (Rivest-Shamir-Adlemen) and ECC (Elliptic Curve Crypto-system: elliptic curve cryptography) etc. and applicable to nearly all cryptographic algorithm, as DES (Data Encryption Standard:.

The safety chip that with the smart card is representative has obtained in each field using very widely.The main effect of safety chip comprises: the safe storage of data, data encrypting and deciphering, digital signature and authentication and identity discriminating etc.The realization of above-mentioned various functions depends on the modern password algorithm, comprises public key algorithm, block cipher and stream cipher arithmetic etc.Public key algorithm is mainly used in digital signature and differentiates to realize identity with authentication, as RSA and ECC; Block cipher is used for data encrypting and deciphering, as DES and AES; And stream cipher arithmetic is mainly used in the encryption and decryption of data stream, as RC4 (RivestCipher 4).Various different kind of cipher algorithm component (software module or hardware co-processor) are indispensable ingredients in the safety chip.Ordered about by unlawful interests, the security context of safety chip is comparatively abominable, subjects to various types of attacks and cracks.Wherein the power consumption attack at the cryptographic algorithm parts is a kind of effective means that cracks safety chip; Existing bibliographical information adopts the power consumption attack technology successfully to crack the dissimilar safety chip of many moneys.Therefore, the cryptographic algorithm parts in the safety chip must have effective anti-power consumption attack protective capacities.

According to the target of guard technology, various guard technologies can be divided into two classes: the one, eliminating in the cryptographic algorithm specific implementation can be by the leak of power consumption attack, and the 2nd, increase the difficulty of power consumption attack.Eliminate the power consumption attack leak and refer to the correlativity of eliminating between key and the power consumption, modal technological means is the random mask technology.Random mask is exactly to obey equally distributed random number to cover intermediate result in the crypto-operation process, make and be similarly random number by the intermediate result of mask, and its probability distribution and key are irrelevant, and then make power consumption and key have nothing to do, thereby have eliminated the power consumption attack leak.From another angle, if implement the required cost of power consumption attack and the time too high, almost can't implement, also can think the purpose that has reached actual anti-power consumption attack.Be to increase the difficulty of power consumption attack, common technological means comprises: randomized technique, such as in the cryptographic algorithm implementation procedure, inserting redundant at random pseudo-operation, the randomization of computing flow process, insert time-delay at random and introducing at random power consumption noise etc.; Constantization technology, promptly make safety chip carry out cryptographic algorithm time institute power consumed and be almost steady state value, promptly weaken the correlativity between power consumption and the key significantly, such as adopting novel dynamic double rail logic unit, as based on the logic (Sense Amplifier Based Logic:SABL) of sense amplifier etc. with power consumption permanent character; Power consumption smoothing technique, the power consumption when promptly guaranteeing safety chip work increase additional circuit the power consumption of entire chip are carried out dynamic compensation within preset range.

No matter which kind of guard technology all is to reach certain protective capacities with certain cost, is mainly reflected in following aspect: 1) operational performance descends, and unavoidably causes operational performance to descend such as insert redundant operation in the crypto-operation process; In the constant cryptographic algorithm parts of power consumption based on dynamic double rail logic, only there is half the time to do significance arithmetic in each clock period, second half need carry out preliminary filling, so operational performance drop by half at least; 2) chip area increases, such as introduce dynamics compensation circuits with level and smooth power consumption, introduce at random power consumption noise generation module and dynamically guard technology such as double rail logic all unavoidably increase chip area; Guard technologies such as 3) power consumption increases, and power consumption noise, constantization of power consumption and power consumption are level and smooth all make power consumption significantly increase; When realizing cryptographic algorithm with dynamic double rail logic because all logical blocks all with clock as the preliminary filling signal, make the load of clock significantly increase, the power consumption relevant with clock be inevitably increase also.

At present, also have research to adopt asynchronous circuit to realize cipher code arithmetic assisting processor to reach the target of anti-power consumption attack, this mainly is because asynchronous circuit also has certain power consumption permanent character.The power consumption permanent character of asynchronous circuit is mainly derived from the coding and double track and the complementary circuit structure of signal.But, to compare as logical block with the above-mentioned constant logical block of power consumption based on sense amplifier, the power consumption permanent character of asynchronous circuit is relatively poor relatively, and just protective capacities is relatively low.Compare with synchronizing circuit, the significant advantage of asynchronous circuit is its low-power consumption character, the forward position research contents that also is current integrated circuit fields based on the low power dissipation design and the realization technology of asynchronous circuit; But asynchronous circuit does not have the advantage of aspects such as operational performance and chip area.In addition, the realization of asynchronous circuit is difficulty comparatively, lacks ripe design aids.

Block cipher is a kind of algorithm that is used for the quick encryption and decryption of big data quantity, is the gordian technique that solves the information system security problem, and typical block cipher comprises DES and AES, is many wheel iteration type cryptographic algorithms.Block cipher computing module (comprising that software is realized and hardware is realized) is the necessary component in the various safety chips, and its specific implementation also must have effective anti-power consumption attack protective capacities.Also do not adopt at present the asynchronous circuit design to realize both having had higher anti-power consumption attack protective capacities, have the open report of the asynchronous block cipher algorithm coprocessor of high operational performance and low-power consumption characteristic again.

Summary of the invention

The technical problem to be solved in the present invention is: under the prior art condition, a kind of method for designing of asynchronous block cipher algorithm coprocessor is provided, it is higher anti-power consumption attack protective capacities that the coprocessor that adopts this method to design has good power consumption permanent character, has high operational performance and low-power consumption characteristic simultaneously.

In order to solve the problems of the technologies described above, technical scheme of the present invention is: with each takes turns iteration as submodule independently in the block cipher; Adopt hardware description language HDL (Hardware Description Language) to design each submodule respectively, and each submodule is entirely combinational circuit; Use existing synthesis tool that each submodule is carried out logic synthesis, only comprised phase inverter, two the input with the door and or the door the net table; With the net table only be converted to by two inputs of complementation and door and or the net table formed of door; To each submodule coupling of delaying time, increase the time-delay matching module identical with the time-delay of submodule, and the arbitrary input that guarantees each submodule is identical to the time-delay of output signal, and in the circuit any two inputs and door and or time of arrival of two inputs identical; Each submodule is linked in sequence, obtains the complete net table of asynchronous block cipher algorithm coprocessor; Carry out rear end placement-and-routing, obtain GDS (Graphic DataSystem) domain of asynchronous block cipher algorithm coprocessor.

Concrete technical scheme is:

The first step is carried out submodule to block cipher and is divided, with each takes turns iteration as submodule independently in the block cipher.Block cipher is made up of some iteration of taking turns, and takes turns iteration as the DES algorithm by 16 and forms, and the aes algorithm of 128 keys is taken turns iteration by 10 and formed.Each is taken turns iteration as submodule independently, and its major function comprises round transformation and round key layout.Each submodule is designated as S respectively ₁, S ₂... S _i..., S _n(n 〉=1), n represents the iteration wheel number of block cipher, 1≤i≤n.If M is expressly initial, K is a key, and C is a ciphertext, R _jRepresent j (result of round transformation of 1≤j≤n-1), K _kRepresent the k (round key of round transformation of 2≤k≤n); Annexation between each submodule can be described with following mode: (R ₁, K ₂)=F ₁(M, K), (R ₂, K ₃)=F ₂(R ₁, K ₂) ..., (R _N-1, K _n)=F _N-1(R _N-2, K _N-1), C=F _n(R _N-1, K _n), F wherein _iThe power function of representing the i round transformation.

Second step, the submodule design.To each submodule S _i(1≤i≤n) carries out following steps successively:

2.1 adopt hardware description language HDL (as VHDL and Verilog) design submodule, promptly the function of descriptor module realizes all arithmetic sum logical operations of each submodule fully with combinational circuit, does not comprise sequential circuit, obtains the HDL code of submodule.

2.2 use existing synthesis tool that the HDL code of each submodule is carried out logic synthesis, and only use phase inverter, two inputs and door and or these three kinds of standard blocks of door, obtain the logic netlist of submodule; All logical blocks are static single track unit in the net table, therefore are referred to as static single track net table.Because

∧, ∨ } promptly (logic NOT, logical and and logical OR) be complete connection word set in the Boolean algebra, can realize Boolean function arbitrarily, thus phase inverter, two inputs and door and or door can realize logical operation arbitrarily.This step is not introduced the particular criteria unit, and does not have extra constraint, can use the commercial synthesis tool of present maturation, as Synopsys Design Compiler ^TMDeng.

2.3 with static single track net table only be converted to by two inputs of complementation and door and or the compound logic net table formed of door; A pair of complementation two the input with the door and or the door formed the compound logic unit, the net table after the conversion is called compound logic net table.Concrete conversion method is:

2.3.1 in the static single track net table arbitrarily signal (comprising input signal, output signal and inner interconnected signal) increase corresponding inversion signal, all like this signals are coding and double track.If w is the arbitrary signal in the static single track net table, then increase its inversion signal w.

2.3.2 delete all phase inverters in the static single track net table.Because step 2.3.1 has increased corresponding inversion signal for all signals, so need not phase inverter in the compound logic net table.If certain phase inverter in the static single track net table is INV u1 (a, z), wherein INV represents phase inverter, u1 represents the title of phase inverter, a is an input signal, and z is an output signal, and promptly z is the anti-phase of a, the phase inverter u1 in the deletion net table then, and the signal z in the net table replaced with a (a represent a's anti-phase).

2.3.3 be that two inputs arbitrarily and door increase complementary with it or door in the static single track net table.If in the static single track net table certain two the input with the door be AND2u2 (x1, x2, y1), wherein AND2 represent two the input with the door, u2 represent two the input with title, x1 and x2 are input signal, y1 is an output signal, i.e. y1=(x1 ∧ x2); Increase that (y1), wherein OR2 represents two inputs or door for x1, x2, and ui2 represents the title of two inputs or door, and x1 and x2 are input signal, and y1 is an output signal, i.e. y1=(x1 ∨ x2) with two inputs of u2 complementation or door OR2ui2.

2.3.4 be that two inputs arbitrarily or door increase complementary with it and door in the static single track net table.If in the static single track net table certain two input or door be OR2u3 (x3, x4, y2), i.e. y2=(x3 ∨ x4); Increase with two inputs of u3 complementation and a door AND2ui3 (x3, x4, y2), i.e. y2=(x3 ∧ x4).

2.4 increase the time-delay matching module identical coupling of delaying time with the time-delay of submodule, guarantee that the arbitrary input of each submodule is identical to the time-delay of output signal, and in the circuit any two inputs and door and or time of arrival of two inputs identical.Concrete grammar is:

2.4.1 increase the time-delay matching module identical with the time-delay of submodule.The time-delay matching module is made up of the buffer cell BUF that is linked in sequence, BUF be similarly complementary two inputs and door and or the compound logic unit of door composition, the progression of BUF is identical with the progression of the logical block that critical path comprised of submodule.In the time-delay matching module, two inputs with door among the BUF are e, or two inputs of door are e, and (e e) is called computing and triggers control signal the double track signal; When carrying out significance arithmetic, computing is triggered control signal be changed to (1,0); When not carrying out significance arithmetic, computing is triggered control signal be changed to (0,0); Computing triggers control signal and transmits step by step in the time-delay matching module; When computing triggered control signal (1,0) and is passed to output terminal step by step by the input end of time-delay matching module, when the matching module of promptly delaying time was output as (1,0), corresponding submodule had also been finished effective logical operation, and submodule is output as correct result of calculation.

2.4.2 be that the output signal in the critical path of submodule increases buffer cell BUF, guarantee from input signal all identical to the time-delay of any output signal.If all output signals of submodule are O ₁, O ₂..., O _m(m 〉=1), and the progression of the logical block that comprises of input signal to the critical path of each output signal is respectively H ₁, H ₂..., H _m, establishing wherein maximum logical block progression is H; Then add (H-H for each output signal _p) (the buffer cell that the level of 1≤p≤m) is linked in sequence.In the buffer cell that is inserted into, except that the signal that is cushioned, another group input is from the output of upper level buffer cell in the time-delay matching module.The logical block progression that critical path comprised between the output signal of all input signals after be cushioned is all identical like this, and its maximum delay also equates.

2.4.3 to increase buffer cell BUF identical with the time of arrival of two input ends guaranteeing all logical blocks for the input signal in the submodule critical path not.The computing circuit of any output signal of submodule (being equivalent to logical expression) all can be represented with a binary tree: each node is represented the signal in the net table in the binary tree, its child node or be empty (showing that this signal is the input signal of submodule), or be two input ends of counterlogic unit; The height of binary tree represent from the input signal to the output signal the logical block progression of process.From the root node (being output signal) of binary tree, by the sequence binary tree traversal, guarantee that the time of arrival of same level signal is identical, the logical block of same level is carried out significance arithmetic synchronously.If the height of the binary tree corresponding with output signal o is h, the 2nd layer of node (is two child node o of o ₁And o ₂) the height of subtree should be h-1, if o ₁Or o ₂Be the input signal of submodule, then directly increase the buffer cell that the h-2 level is linked in sequence for it; D (height of the subtree of any node x of the layer of 1＜d＜h) should be h-d+1, and if x be input signal, then directly increase the buffer cell that the h-d level is linked in sequence for it; Until the node of traversal,, then directly increase the first-level buffer unit for it if certain node of this layer is input signal to the h-1 layer.

The 3rd step, will be integrated through the submodule of time-delay coupling, be about to S ₁, S ₂..., S _nBe linked in sequence submodule S ₁Accepting the initial input signal is plaintext M and key K; S ₁(2≤1≤n-1) accept S _1-1The output that produces, its result is as S ₁₊₁Input; S _nProducing final operation result is ciphertext C; Time-delay matching module with each submodule correspondence is linked in sequence simultaneously, so S ₁To S _nBetween logical block progression just with the logic progression unanimity of all timelag matching unit that are linked in sequence, just realized the time-delay coupling of whole coprocessor.After all submodules are integrated, just obtained the complete net table of asynchronous block cipher algorithm coprocessor.

The 4th step, carry out rear end placement-and-routing, obtain the GDS domain of asynchronous block cipher algorithm coprocessor.When back end design, need to guarantee that all double track signals have identical load so that two complementary inputs and door and or door have and the irrelevant power consumption permanent character of input.Under existing integrated circuit technology condition, the ghost effect that is caused by interconnection line accounts for major part in entire circuit, as long as the load of the interconnection line of double track signal is identical, then the load of double track signal is identical.Therefore with two inputs of all complementations and door and or door become the rotational symmetry placement, make the double track signal have symmetrical cabling and identical interconnection line length, thereby make the load of double track signal almost completely identical.

From in essence, be a combinational circuit according to the asynchronous block cipher algorithm coprocessor of above-mentioned steps design.But different with traditional combinational circuit is, every group of data (streamlined) transmission step by step in each hierarchical logic unit, and current data such as need not disposes and could import next group data, promptly when first group of input when the logical block of first level is delivered to the logical block of next level, can import second group of data; The rest may be inferred, and when first group of input disposes when promptly obtaining corresponding operation result, second group of input is passed to the logical block of level second from the bottom.That is to say that coprocessor can be considered a kind of streamline, its working method is: when all input signals that comprise computing triggering control signal were 0, logical block entered the state that does not have upset step by step in the coprocessor, does not consume dynamic power consumption; When computing triggers control signal and is changed to (1,0) and input valid data, through the transmission step by step of logical block in the coprocessor, when with submodule S _nWhen the output of corresponding time-delay matching module becomes (1,0), S _nOutput terminal be the significance arithmetic result; Satisfying under the minimum interval condition, can import multi-group data continuously; Coprocessor can be handled multi-group data at synchronization, and the minimum interval that adjacent input data enter coprocessor depends on the maximum delay of single logical block in the coprocessor; For reaching better power consumption permanent character, make complete 0 input and effectively import data alternately entering coprocessor.

Coprocessor is equivalent to multi-level internet structure, and each grade logical block is only accepted the output of upper level logical block; Only there be the local interconnected of adjacent level in coprocessor, does not exist to stride the interconnected of level; Logical block with one-level is carried out significance arithmetic synchronously.On macroscopic view, each submodule is as one section of streamline; From microcosmic, the logical block of same level constitutes a section of streamline.As long as the time interval of adjacent input data is not less than the time-delay (being designated as Δ t) of the logical block that has maximum load in the streamline, the signal that just can guarantee different input data can not penetrate in streamline, has guaranteed that promptly the logical block of same level only may be handled one group of data of input simultaneously at synchronization.From this angle, the time-delay that has the logical block of maximum load in the streamline has determined the highest clock frequency of equal value (1/ Δ t) that streamline can reach, the i.e. inverse of maximum delay.In fact, the maximum delay of single logical block is far smaller than one and takes turns the time-delay of iteration, and coprocessor does not comprise register and latch, has also just avoided the time-delay that is caused by register and latch; (comprise synchronizing circuit and asynchronous circuit with the block cipher algorithm coprocessor that adopts usual manner to realize, one takes turns iteration a section as streamline) compare, adopt the asynchronous block cipher algorithm coprocessor of the present invention's design can reach the highest clock frequency of equal value that is higher than the former far away.When coprocessor was operated at full capacity, the highest encryption and decryption throughput that can reach was (1/ Δ t).

Adopt the present invention can reach following technique effect:

1. when the second step modular design, only adopt two complementary inputs and door and or door realize asynchronous block cipher algorithm coprocessor, when complete 0 input and valid data alternately complementary two inputs of input and door and or the time, the compound logic of the two composition has good power consumption permanent character; Therefore by two inputs of complementation and door and or the block cipher algorithm coprocessor that constitutes of door have good power consumption permanent character equally, the correlativity between the power consumption of key and coprocessor levels off to 0, coprocessor has higher anti-power consumption attack protective capacities.

When the second step modular design for each submodule coupling of delaying time, make the logical block of same level carry out significance arithmetic synchronously, the logical block of same level has constituted a section of streamline; As long as satisfy certain time interval, just can be continuously to coprocessor input multi-group data; And the minimum interval of adjacent input data depends on the maximum delay of single logical block in the coprocessor, because the maximum delay of single logical block is far smaller than one and takes turns the time-delay of iteration, coprocessor does not comprise register and latch, has just avoided the time-delay that is caused by register and latch yet; The highest clock frequency of equal value of the asynchronous block cipher algorithm coprocessor of employing the present invention design can be higher than the conventional implementation of employing far away and (comprise synchronizing circuit and asynchronous circuit, one takes turns iteration a section as streamline) design coprocessor, operational performance improves greatly.

3. adopt the asynchronous block cipher algorithm coprocessor of the present invention's design not have sequential logics such as clock signal and register, do not have long interconnection line; When coprocessor was not worked, all input signals remained 0, and only there is quiescent dissipation in the no signal upset in the coprocessor, and therefore the asynchronous block cipher algorithm coprocessor that adopts the present invention to design has the low-power consumption characteristic.

4. adopt the present invention not need to design any special logical block, can directly use existing standard block; Use existing ripe method of designing integrated circuit to greatest extent, comprise HDL design, logic synthesis and back end design etc.; Adopt the method for time-delay coupling (promptly in the compound logic net table of each submodule, inserting necessary buffer cell) to make asynchronous pipeline correct, (whether the combinatorial logic unit of handshake circuit in must testing circuit finishes computing and then arrival stable state need not to design handshake circuit in the conventional asynchronous circuit, and send answer signal to previous stage, one-level is sent request signal backward); Therefore the present invention has made full use of prior art and instrument, and is simple.

The present invention is applicable to the design and the realization of block cipher algorithm coprocessor in the various safety chips that may be subjected to power consumption attack, can reach the good compromise of anti-power consumption attack protective capacities and operational performance and power consumption.

Description of drawings

Fig. 1 designs the overview flow chart of asynchronous block cipher algorithm coprocessor for adopting the present invention;

Fig. 2 is the synoptic diagram of the 3rd step net table transfer process in second step;

Fig. 3 is the 4th step time-delay matching process synoptic diagram in second step;

Fig. 4 is the overall construction drawing of the asynchronous block cipher algorithm coprocessor of employing the present invention design;

Fig. 5 is the working method synoptic diagram of asynchronous block cipher algorithm coprocessor.

Embodiment

Fig. 1 carries out the design flow diagram of asynchronous block cipher algorithm coprocessor for adopting the present invention, mainly comprises the steps:

1. block cipher is carried out submodule and divide, obtain submodule;

2. submodule design comprises the steps:

2.1HDL design, obtain the HDL code of submodule.

2.2 the HDL code to submodule carries out logic synthesis, obtains the static single track net table of submodule.

2.3 with static single track net table only be converted to by two inputs of complementation and door and or the net table formed of door, obtain compound logic net table.

2.4 increase the time-delay matching module identical with the time-delay of submodule, guarantee that the arbitrary input of each submodule is identical to the time-delay of output signal, and in the circuit any two inputs and door and or time of arrival of two inputs identical.

The net table of each submodule after time-delay coupling is integrated 3., obtain the complete net table of asynchronous block cipher algorithm coprocessor.

4. rear end placement-and-routing obtains the GDS domain of asynchronous block cipher algorithm coprocessor.

Fig. 2 is the synoptic diagram of the 3rd step net table transfer process in second step.I1, i2 ..., i8 represents input signal, o1 and o2 represent output signal, and n1, n2 ..., n5 is the interconnected signal.The arrow left side of Fig. 2 is the logical circuitry (being static single track net table) before the conversion of net table, and the logical operation expression formula of each output signal is respectively: o1=((i1 ∧ i2) ∧ (i3 ∨ i4)) ∨ (i5 ∧ i6), o2=(i7 ∧ i8).Arrow the right of Fig. 2 is that the logical expression of each output signal is: o1=((i1 ∨ i2) ∨ (i3 ∧ i4)) ∧ (i5 ∨ i6), o2=(i7 ∨ i8), o2=(i7 ∧ i8) through the logical circuitry after the conversion of net table.The detailed process of net table conversion is:

1. be that all signals increase corresponding inversion signal, as the i1 on arrow the right among Fig. 2 ..., i8, o1, o2, n1 ..., n5.

2. delete all phase inverters in the static single track net table, and the output signal of phase inverter is replaced with the anti-phase of phase inverter input signal.Shown in arrow the right among Fig. 2, delete the phase inverter 6 on the arrow left side among Fig. 2, and o2 is replaced with n4.

3. be that two inputs arbitrarily and door increase complementary with it or door in the static single track net table.Shown in arrow the right among Fig. 2, for first of the arrow left side among Fig. 2 increases complementary the 8th or door 8, the tenth or door the 10, the 11 or door 11 and the 12 or door 12 respectively with door the 1, the 3rd and door the 3, the 4th and door the 4 and the 5th and door 5.

4. be that two inputs arbitrarily or door increase complementary with it and door in the static single track net table.Shown in arrow the right among Fig. 2, for second or the door 2 on the arrow left side among Fig. 2 and the 7th or door 7 increase respectively the complementary the 9th with door the 9 and the 13 and door 13.

Fig. 3 is the 4th step time-delay matching process synoptic diagram in second step.The time-delay coupling mainly comprises the steps:

1. increase the time-delay matching module identical with the time-delay of submodule.In the circuit as Fig. 3 arrow left side, critical path is the data path of input signal i1 (or i2) to o1, comprises 3 grades of logical blocks.Therefore, just comprise 3 grades of buffer cell BUF that are linked in sequence with the corresponding time-delay matching module of this circuit; Shown in Fig. 3 arrow the right, the 14 with door 14 and the 15 or door the 15, the 16 and door 16 and the 17 or door the 17 and the 18 and door 18 and the 19 or door 19 form 3 BUF that are linked in sequence respectively.The input of time-delay matching module be computing trigger control signal be designated as (e1, e1), output be designated as (e4, e4), intermediate result be designated as (e2, e2) and (e3, e3).In the time-delay matching module, all identical with two inputs of door, be respectively e1, e2, e3; Or two inputs of door are also identical, are respectively e1, e2, e3.

2. be that the output signal on the submodule critical path increases buffer cell BUF, guarantee from input signal all identical to the time-delay of any output signal.In circuit as Fig. 3 arrow left side, be 1 by input signal to the logical block progression of output signal o2 and o2, less than the logical block progression of input signal, therefore be o2 and o2 interpolation two-stage buffer cell to o1; Shown in Fig. 3 arrow the right, the 20 with door 20 and the 21 or door the 21 and the 22 and door 22 and the 23 or door 23 form the BUF that two-stage is linked in sequence respectively, the former is (n4, n4) cushion and obtain (n6, n6), the latter is (n6, n6) cushion and obtain (o2, o2).The 20 is input as n4 and e2 with door 20 two; The 21 or door 21 two be input as n4 and e2; The 22 is input as n6 and e3 with door 22 two; The 23 or door 23 two be input as n6 and e3.

3. to increase buffer cell identical with the time of arrival of two input ends guaranteeing all logical blocks for the input signal on the submodule critical path not.In the circuit as Fig. 3 arrow left side, (i5 is i5) with (i6 i6) not on critical path, therefore increases the first-level buffer unit respectively for the two.Shown in Fig. 3 arrow the right, the 24 with door 24 and the 25 or the BUF that form of door 25 be (i5 i5) cushions, obtain (n7, n7); The 26 with door 26 and the 27 or the BUF that form of door 27 be (i6 i6) cushions, obtain (n8, n8).The 24 is input as i5 and e1 with door 24 two; The 25 or door 25 two be input as i5 and e1; The 26 is input as i6 and e1 with door 26 two; The 27 or door 27 two be input as i6 and e1.

Fig. 4 is the overall construction drawing of the asynchronous block cipher algorithm coprocessor of employing the present invention design, and this figure has also illustrated the integrated process of the present invention's the 3rd step module.Submodule S ₁, S ₂..., S _nBe linked in sequence S ₁Accept initial plaintext M and key K, S _nProducing operation result is ciphertext C; S _i(1＜i＜n) accepts S _I-1Output, the result of generation offers S _I+1There is not latch between the adjacent submodule.Each submodule has corresponding time-delay matching module, and the time-delay matching module is made up of the buffer cell that is linked in sequence, and the progression of buffer cell is identical with the logical block progression that the submodule critical path is comprised.The computing that is input as of time-delay matching module triggers control signal, the input signal that the computing of the time-delay matching module corresponding with each submodule triggers control be designated as respectively (ei1, ei1), (ei2, ei2) ... (ein, ein), output signal is designated as (eo1 respectively, eo1), (eo2, eo2) ... (eon, eon).Shu Ru one group of data are transmitted between each submodule step by step simultaneously, and computing triggers control signal transmission step by step between each section time-delay matching module.When plaintext, key and computing triggering control signal were 0, coprocessor entered no rollover states step by step, only consumes quiescent dissipation; When (ei1 ei1) is (1,0) and expressly and key when being effective value, coprocessor begins to carry out effective processing, through after the transmission step by step, when (eon, when eon) becoming (1,0), the output of Sn is corresponding ciphertext C.

Fig. 5 is the working method synoptic diagram of the asynchronous block cipher algorithm coprocessor of employing the present invention design.If asynchronous block cipher algorithm coprocessor comprises N level logical block, and the maximum delay of single logical block is Δ t in the coprocessor, and the minimum interval that promptly adjacent input data enter coprocessor is Δ t; Interval between the different input data is designated as a * Δ t, and then a is not less than 1 arbitrary value; Finishing a valid password computing required time is N * Δ t; At synchronization, may exist multi-group data processed.When all inputs of coprocessor were 0, coprocessor entered the state that does not have upset step by step; For reaching better power consumption permanent character, complete 0 input and valid data input alternately can be entered coprocessor; It is throughput to reach higher operational performance that valid data also can enter coprocessor continuously.When coprocessor is operated at full capacity, if complete 0 input and valid data input alternately enter coprocessor, throughput (1/2 Δ t); If the valid data input enters the coprocessor waterline continuously, throughput is (1/ Δ t).Because the maximum delay of single logical block is Δ t to be far smaller than one and to take turns the time-delay of iteration, coprocessor does not comprise register and latch, has just avoided the time-delay that is caused by register and latch yet; Compare with the block cipher algorithm coprocessor (comprise synchronizing circuit and asynchronous circuit, takes turns iteration a section as streamline) that adopts usual manner to realize, adopt the operational performance of the asynchronous block cipher algorithm coprocessor of the present invention's design to be higher than the former far away.

Claims

1. the method for designing of an asynchronous block cipher algorithm coprocessor is characterized in that comprising the steps:

The first step is carried out submodule to block cipher and is divided, and with each takes turns iteration as submodule independently in the block cipher, each submodule is designated as S respectively ₁, S ₂... S _i..., S _n(n 〉=1), n represents the iteration wheel number of block cipher; If M is expressly initial, K is a key, and C is a ciphertext, R _jThe result who represents the j round transformation, K _kThe round key of representing the k round transformation, 1≤j≤n-1,2≤k≤n, the annexation between each submodule is: (R ₁, K ₂)=F ₁(M, K), (R ₂, K ₃)=F ₂(R ₁, K ₂) ..., (R _N-1, K _n)=F _N-1(R _N-2, K _N-1), C=F _n(R _N-1, K _n), F wherein _iThe power function of representing the i round transformation, 1≤i≤n;

In second step, the submodule design is to each submodule S _iCarry out following steps successively:

Step 1, the function of employing hardware description language HDL descriptor module realizes all arithmetic sum logical operations of each submodule fully with combinational circuit, obtain the HDL code of submodule;

Step 2 uses existing synthesis tool that the HDL code of each submodule is carried out logic synthesis, and only use phase inverter, two inputs and door and or these three kinds of standard blocks of door, obtain the static single track net table of submodule;

Step 3, with static single track net table only be converted to by two inputs of complementation and door and or the compound logic net table formed of door, conversion method is:

Step 1 is in the static single track net table that signal increases corresponding inversion signal arbitrarily: establish w and be the arbitrary signal in the static single track net table, then increase its inversion signal w;

Step 2, delete all phase inverters in the static single track net table: certain phase inverter of establishing in the static single track net table is INVu1 (a, z), wherein INV represents phase inverter, and u1 represents the title of phase inverter, and a is an input signal, z is an output signal, be that z is the anti-phase of a, then the phase inverter u1 in the table is netted in deletion, and the signal z in the net table is replaced with a;

Step 3, for two inputs arbitrarily in the static single track net table and door increase complementary with it or door: establish that certain two input is AND2 u2 (x1 with door in the static single track net table, x2, y1), wherein AND2 represents two inputs and door, and u2 represents the title of two inputs and door, and x1 and x2 are input signal, y1 is an output signal, i.e. y1=(x1 ∧ x2); Increase that (y1), wherein OR2 represents two inputs or door for x1, x2, and ui2 represents the title of two inputs or door, and x1 and x2 are input signal, and y1 is an output signal, i.e. y1=(x1 ∨ x2) with two inputs of u2 complementation or door OR2 ui2;

Step 4, for two inputs arbitrarily or door in the static single track net table increase complementary with it and door: establish in the static single track net table certain two input or door and be OR2 u3 (x3, x4, y2), be y2=(x3 ∨ x4), increase two inputs and door AND2 ui3 (x3, x4 with the u3 complementation, y2), i.e. y2=(x3 ∧ x4);

Step 4 increases the time-delay matching module identical with the time-delay of the submodule coupling of delaying time, and method is:

Step 1), increase the time-delay matching module identical with the time-delay of submodule, the time-delay matching module is made up of the buffer cell BUF that is linked in sequence, BUF be complementary two inputs with door and or the compound logic unit formed of door, the progression of BUF is identical with the progression of the logical block that critical path comprised of submodule; Two inputs with door among the BUF are e, or two inputs of door are e, and (e e) is called computing and triggers control signal the double track signal; When carrying out significance arithmetic, computing is triggered control signal be changed to (1,0); When not carrying out significance arithmetic, computing is triggered control signal be changed to (0,0); Computing triggers control signal and transmits step by step in the time-delay matching module; When computing triggered control signal (1,0) and is passed to output terminal step by step by the input end of time-delay matching module, when the matching module of promptly delaying time was output as (1,0), corresponding submodule had also been finished effective logical operation, and submodule is output as correct result of calculation;

Step 2), increase buffer cell BUF for the output signal in the critical path of submodule not: all output signals of establishing submodule are O ₁, O ₂..., O _m, m 〉=1, and the progression of the logical block that comprises of input signal to the critical path of each output signal is respectively H ₁, H ₂..., H _m, establishing wherein maximum logical block progression is H; Then add H-H for each output signal _pThe buffer cell that level is linked in sequence, 1≤p≤m; In the buffer cell that is inserted into, except that the signal that is cushioned, another group input is from the output of upper level buffer cell in the time-delay matching module;

Step 3), for the input signal in the submodule critical path not increases buffer cell BUF: when the computing circuit of any output signal of submodule is represented with a binary tree, each node is represented the signal in the net table in the binary tree, its child node or for empty i.e. this signal is the input signal of submodule, or be two input ends of counterlogic unit; The height of binary tree represent from the input signal to the output signal the logical block progression of process, from the root node of binary tree is output signal, by the sequence binary tree traversal, guarantee that the time of arrival of same level signal is identical, the logical block of same level is carried out significance arithmetic synchronously; If the height of the binary tree corresponding with output signal o is h, the 2nd layer of node is two child node o of o ₁And o ₂The height of subtree be h-1; If o ₁Or o ₂Be the input signal of submodule, then directly increase the buffer cell that the h-2 level is linked in sequence for it; The height of the subtree of any node x of d layer should be h-d+1,1＜d＜h, and if x be input signal, then directly increase the buffer cell that the h-d level is linked in sequence for it; Until the node of traversal,, then directly increase the first-level buffer unit for it if certain node of this layer is input signal to the h-1 layer;

The 3rd step, will be integrated through the submodule of time-delay coupling, be about to S ₁, S ₂..., S _nBe linked in sequence submodule S ₁Accepting the initial input signal is plaintext M and key K; S _lAccept S _L-1The output that produces, its result is as S _L+1Input, 2≤l≤n-1; S _nProducing final operation result is ciphertext C; Time-delay matching module with each submodule correspondence is linked in sequence simultaneously, after all submodules are integrated, obtains the complete net table of asynchronous block cipher algorithm coprocessor;

The 4th step, carry out rear end placement-and-routing, obtain the GDS domain of asynchronous block cipher algorithm coprocessor.

2. the method for designing of asynchronous block cipher algorithm coprocessor as claimed in claim 1, it is characterized in that when carrying out rear end placement-and-routing, with two inputs of all complementations and door and or door become rotational symmetry placement, make the double track signal have symmetrical cabling and identical interconnection line length.