CN102929574A

CN102929574A - Pulse multiplying unit design method on GF (Generator Field) (2163) domain

Info

Publication number: CN102929574A
Application number: CN2012103952515A
Authority: CN
Inventors: 任俊彦; 黄佳森; 叶凡; 李宁; 许俊; 李巍
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2012-10-18
Filing date: 2012-10-18
Publication date: 2013-02-13

Abstract

The invention belongs to the technical field of digital integrated circuits and systems and particularly relates to a pulse multiplying unit design method on a domain. According to the pulse multiplying unit design method, a full-parallel 8*8 module is used as a basic unit to carry out 21-grade cascading; a combined logic distribution of s whole circuit is adjusted by adopting a retiming algorithm; and the highest clock frequency of the system is improved.

Description

GF (2 163) pulsation Multiplier Design method on the territory

Technical field

The invention belongs to digital integrated circuit and systems technology field, be specifically related to a kind of GF (2 ¹⁶³) pulsation Multiplier Design method on the territory.

Background technology

Theory of finite fields not only has important theoretical research to be worth in Modern Mathematics, has had widely aspect advanced information society a lot of and has used.Galois field multiplier is the elementary cell in a lot of the application, such as elliptic curve cipher, error correcting code, digital signal processing etc.So, design a efficient Galois field multiplier and have great significance for the lifting of system performance.

Tan Siwei in August, 2011 in " based on design and the realization of the restructural Galois field multiplier of RS code " literary composition that " calculating and application and software " is delivered, adopted naturally basic algorithm, realized the multiplying on the Galois field with simple logic gates, and proposed the reconfigurable iterative computation structure based on the RS code, can satisfy territory length and be 3～8 multiplier.The demand of design has been satisfied in this design to a certain extent, and especially it considers that the RS code has stronger error correcting capability, and the accuracy of multiplying is had certain guarantee.But its deficiency when realizing based on FPGA is: (1) does not implement the design philosophy of " bottom-up ", does not use well based on submodule to make up the multiplier that is applicable to large exponent number territory; (2) consideration of optimization is not done in its design to the delay of circuit, does not have the to the full extent maximum clock frequency of elevator system.

The present invention at first makes up the basic multiplication unit of 8*8bit, then builds based on the multiplier on the Galois field that is applicable to high exponent number 163 of elementary cell, and using for the digital signal processing of intensive provides possibility.And the thought that the present invention is based on when resetting is optimized the structure of integrated circuit, has promoted to a certain extent the maximum clock frequency of system, for high-speed digital design provides guarantee.

Summary of the invention

The object of the present invention is to provide the multiplier that a kind of area is little, performance is high Pulsation multiplier on the territory, the maximum clock frequency that it can elevator system can satisfy in the digital information processing system of the accuracy control system of guarantee reliable data transmission, the encryption system that ensures information safety, intensive the requirement to multiplier.

Provided by the invention a kind of

Pulsation Multiplier Design method on the territory, whole system is carried out 21 grades of cascades take complete parallel 8*8 module as elementary cell, and adopts algorithm when resetting that the combinational logic of integrated circuit is distributed and adjust.

Adopted systolic structures to dividing to the multiplier architecture on the localization among the present invention.

Among the present invention, in Galois field

Figure 2012103952515100002DEST_PATH_IMAGE002

On, the full parallel organization elementary cell that makes up described 8*8 has adopted the multiplication algorithm of high priority (MSB-first), and specific algorithm is described below:

Input:?

Figure 2012103952515100002DEST_PATH_IMAGE003

,

Figure 2012103952515100002DEST_PATH_IMAGE004

,

Figure 2012103952515100002DEST_PATH_IMAGE005

Output:?

Figure 2012103952515100002DEST_PATH_IMAGE007

,

Figure 2012103952515100002DEST_PATH_IMAGE008

Figure 2012103952515100002DEST_PATH_IMAGE009

,

Figure 2012103952515100002DEST_PATH_IMAGE010

for?

Figure 2012103952515100002DEST_PATH_IMAGE011

?to

?do

for?

Figure 2012103952515100002DEST_PATH_IMAGE013

?to?0?do

Figure 2012103952515100002DEST_PATH_IMAGE014

Figure 2012103952515100002DEST_PATH_IMAGE015

Wherein,

Figure 2012103952515100002DEST_PATH_IMAGE016

Expression

Figure 2012103952515100002DEST_PATH_IMAGE017

Individual coefficient, ,

Figure 2012103952515100002DEST_PATH_IMAGE020

,

Figure 2012103952515100002DEST_PATH_IMAGE021

Expression

,

,

Individual coefficient, m=163.Among the present invention, when designing the full parallel organization elementary cell of described 8*8, adopted the scheme of coordinate transform, that is:

Figure 2012103952515100002DEST_PATH_IMAGE023

(1)

Wherein i represents the number of times of circuit iteration, and j represents polynomial exponential factor, thereby the two-way input between each module of X direction is mapped as the dependency graph of one dimension.

Among the present invention, complete parallel structural drawing is divided into N*N module, and L*L pulsating element arranged in each module, and algorithm was optimized the combinational logic distribution of integrated circuit when employing reset, and comprising:

Add the Ctrl=011 of a N+1 length ... 1 control signal, wherein A (x) and G (x) enter complete parallel 8*8 module with the pattern of serial, and B (x) enters complete parallel 8*8 module with parallel pattern;

Register and L the MUX of adding L 1 bit arrive the 8*8 module that each walks abreast entirely, are used for result of calculation

Figure 2012103952515100002DEST_PATH_IMAGE024

Be broadcast to the capable unit of each i;

Add the register of the individual delay unit of 3 (L-1) to each 8*8 module that entirely walks abreast, be used for preserving the data that pass to the unit of region k+1 leftmost from the unit of region k rightmost with diagonal manner;

Add again the register of L MUX and L 1 bit to each 8*8 module that entirely walks abreast, be used for allowing B (x) also enter complete parallel 8*8 module with the serial pattern, when the Ctrl signal is 0, B (x) is reprinted the 8*8 module that walks abreast into entirely, the circuit structure diagram after when obtaining resetting.

Among the present invention, a pulsation multiplier that contains the L bit digit width of N basic module can be in the territory

Figure 2012103952515100002DEST_PATH_IMAGE025

Upper take following sequence as list entries, wherein

Figure 2012103952515100002DEST_PATH_IMAGE026

:

Figure 2012103952515100002DEST_PATH_IMAGE027

…………………………………………………………（2）

Figure 2012103952515100002DEST_PATH_IMAGE028

…………………………………………………………………（3）

………………………………………………………（4）

0≤i≤N-1 wherein, and d=NL-m, and in the situation for the j that exceeds [0, m-1] scope,

Figure 2012103952515100002DEST_PATH_IMAGE030

, the data sequence of output For:

Figure 2012103952515100002DEST_PATH_IMAGE032

………………………………………………………（5）

And in the situation for the j that exceeds [0, m-1] scope,

Figure 2012103952515100002DEST_PATH_IMAGE033

Beneficial effect of the present invention is: algorithm was adjusted the combinational logic distribution of integrated circuit when its method for designing employing reset, and had improved system's maximum clock frequency.

Description of drawings

Fig. 1 is basic processing unit in the circuit that obtains of the multiplication algorithm analysis of high priority;

Fig. 2 is

Figure 2012103952515100002DEST_PATH_IMAGE034

The structural drawing of the full Parallel Design that obtains of the multiplication algorithm of high priority;

Fig. 3 is the circuit basic processing unit after the coordinate transform;

Fig. 4 is the full Parallel Design structural drawing after the coordinate transform;

Fig. 5 is that m/L is the one dimension dependency graph that is obtained by Fig. 4 projection of integer;

Fig. 6 is the cut-away view of each basic module;

Fig. 7 is the cut-away view of each basic module after B (X) is loaded;

One dimension dependency graph after Fig. 8 attaches most importance to regularly;

Fig. 9 is the systolic structures figure of 21 grades of cascades;

Figure 10 is the 1*8 computing unit;

Figure 11 is the 8*8 computing unit;

Figure 12 is the 163*163bit multiplier with the 8*8bit structural generation;

Figure 13 is the coefficient of as a result each power of MATLAB emulation;

Figure 14 is for beginning Modelsim simulation result to the 4400ns;

Figure 15 is Modelsim simulation result in 4400ns～5us;

Figure 16 is Modelsim simulation result in 5us～5600ns;

Figure 17 is Modelsim simulation result in 5600ns～6us.

Embodiment

Further describe the present invention below in conjunction with accompanying drawing.

In Galois field

On, the present invention has adopted the multiplication algorithm of high priority (MSB-first), and specific algorithm is described below:

Input:?

,

,

Output:?

,

,

for?

?to

?do

for?

?to?0?do

Expression

Individual coefficient,

,

,

Expression

,

,

Individual coefficient, among the present invention, m=163.

With Galois field

The multiplication algorithm of high priority be example, the structural drawing of its full Parallel Design as shown in Figure 2, it has comprised 4*4 basic processing unit (i, j), wherein i represents the number of times of circuit iteration, and j represents polynomial exponential factor, multinomial coefficient all is parallel input, and the result is parallel output.For the Digit-serial structure, take L=2 as example, be about to complete parallel structural drawing and be divided into N*N module, L*L pulsating element, wherein N=m/L are arranged in each module, and then it is mapped as the dependency graph of one dimension, but can be found to have two-way input between each module of X direction by Fig. 2, this is so that we can not directly be mapped as the dependency graph of one dimension, to this, the present invention has adopted the scheme of coordinate transform to solve this problem, that is:

(1) the circuit basic processing unit through conversion becomes shown in Figure 3.Corresponding full Parallel Design structural drawing as shown in Figure 4, can find out the problem that does not have two-way input when carrying out Module Division for this structure, and the function that this circuit is realized with do not carry out coordinate transform before circuit identical, the direction that is input and output has changed.

If m/L is integer, Fig. 4 is projected as the one dimension dependency graph, as shown in Figure 5, wherein "

" delay unit of expression.The cut-away view of each basic module in order to control this module, has added the Ctrl=011 of a N+1 length as shown in Figure 6 ... 1 control signal.Wherein A (x) and G (x) enter module with the pattern of serial, and B (x) enters module with parallel pattern, and FACTOR P (x) will be exported L-bit output with a clock cycle as a result.Because result of calculation Must broadcast to the capable unit of each i, so having added register and L MUX of L 1 bit, the present invention arrives each module, in order to preserve the data that pass to the unit of region k+1 leftmost from the unit of region k rightmost with diagonal manner, we have added the register of the individual delay units of 3 (L-1) to each module.

Add the register of L MUX and L 1 bit to each module again, can allow B (x) also enter module with the serial pattern, when the Ctrl signal was 0, B (x) was reprinted the progressive die piece, namely as shown in Figure 7.And then the circuit structure diagram after when obtaining resetting as shown in Figure 8, if can see input this moment continuously, this circuit structure can be after the time-delay of the clock period of a 3N+1 be exported result of calculation take every N clk as the cycle.

The present invention has designed the basic module of 8*8, carries out 21 grades cascade again, thereby has realized The pulsation multiplier of 8bit on the territory, whole schematic diagram as shown in Figure 9.

The design of 8*8 computing unit at first utilizes 8 minimum units to be combined into the computing unit of 1*8, and as shown in figure 10, this unit is pure combinational circuit.Then be combined into as shown in figure 11 8*8 computing unit by the computing unit of 8 1*8, a pair of ordinate of alphabetical r mark is connected with a pair of ordinate of lastrow letter g mark among the figure, and g represents to exist a delay unit.Letter y represents data selector, when being used for processing this computing unit and beginning to calculate in the calculating formula Value.The zero hour, control signal was that the output of 0, y is one section broken line signal of its top; The control signal of y is 1 afterwards, and output is exactly the T in left side _iSignal.This computing unit needs two clock period just can finish calculating, and the result is offered the next stage computing unit, so two all after dates of control signal time-delay are as the control signal of next computing unit.The rectangular expression that the palisade grid of below is filled among the figure be used for the delaying time register of a clock period, the existence of this register is in order to realize pipeline organization, to reduce the length of critical path.In fact consider the time-lag action that adds green block, in fact the Gout of its output and Aout are exactly the Gin of input and the time-delay in two cycles of Ain.

Can consist of 163*163 position multiplier as shown in figure 12 by the 8*8 unit, so owing to 163 can not be divided exactly by 8 and must process the non-problem that divides exactly.The computation structure of 163*163 is expanded to the computation structure of 168*168 for this reason, five unnecessary whole assignment are zero, because the initialization input when top and right-hand lowest order calculate in the computing all is zero, so the figure place of replenishing can not exert an influence to result of calculation.

Adopt the MATLAB calculation procedure to calculate one group of data, the data of theory calculating and the result of MODELSIM emulation are compared.163 of zero padding input data are not expressed as follows with the set mode:

(1) i|0≤i≤162 and a[i]=1={ 0,2,4,5,6,7,8,9,10,94,100,110,147,148,149,150,151,152,153,154 }

(2) i|0≤i≤162 and b[i]=1={ 0,8,9,10,11,12,13,14,15,52,53,54,55,56,57,58,59 }

(3) i|0≤i≤162 and g[i]=1={ 0,3,6,7 }.

The coefficient of as a result each power of emulation as shown in figure 13.Figure 14 to Figure 17 correspondence the simulation result of Modelsim, at first output is a high position among the result among Figure 14, last position of 8 bit data of first output effectively, later 8 in each clock period Output rusults, the front two of last 8 Output rusults is effective.Several bit data that most begin to export in Figure 14 simulation result are 1,10010100, and 11000001. is consistent with the result of

calculation

1,10010100,11000001 of MATLAB, contrasts all 163 results, and both are identical.

Claims

1. one kind

Figure 2012103952515100001DEST_PATH_IMAGE001

Pulsation Multiplier Design method on the territory, it is characterized in that: adopted systolic structures to dividing to the multiplier architecture on the localization, take complete parallel 8*8 module as elementary cell, carry out 21 grades of cascades, and adopt algorithm when resetting that the combinational logic of integrated circuit is distributed and adjust; Simultaneously in Galois field

On, having adopted the multiplication algorithm of high priority when making up the full parallel organization elementary cell of described 8*8, specific algorithm is described below:

Input:? ,

,

Figure 2012103952515100001DEST_PATH_IMAGE005

Output:?

Figure 2012103952515100001DEST_PATH_IMAGE007

,

Figure 2012103952515100001DEST_PATH_IMAGE009

,

for? ?to

?do

for?

Figure 2012103952515100001DEST_PATH_IMAGE013

?to?0?do

Wherein, Expression

Individual coefficient,

,

,

Expression

, ,

Individual coefficient, m=163.

2. pulsation Multiplier Design method according to claim 1 is characterized in that: design described 8*8 entirely also

During row structure elementary cell, adopted the scheme of coordinate transform, that is:

, wherein i represents the number of times of circuit iteration, and j represents polynomial exponential factor, thereby the two-way input between each module of X direction is mapped as the dependency graph of one dimension.

3. pulsation Multiplier Design method according to claim 1, it is characterized in that, described complete parallel structural drawing is divided into N*N module, and L*L pulsating element arranged in each module, algorithm is optimized the combinational logic distribution of integrated circuit when adopting described resetting, and comprising:

Register and L the MUX of adding L 1 bit arrive the 8*8 module that each walks abreast entirely, are used for result of calculation Be broadcast to the capable unit of each i;

4. pulsation Multiplier Design method according to claim 1 is characterized in that: for

When the territory has at first made up the module of full parallel organization of 8*8, because the exponent number 163 of Galois field can not be divided exactly by the pulsating element in the module several 8, therefore introduced " zero padding " thought: a pulsation multiplier that contains the L bit width of N basic module, can be in the territory

Upper take following sequence as list entries, wherein

:

…………………………………………………………（2）

………………………………………………………（4）

0≤i≤N-1 wherein, and d=NL-m, and in the situation for the j that exceeds [0, m-1] scope, , the data sequence of output

For:

………………………………………………………（5）

And in the situation for the j that exceeds [0, m-1] scope,