CN1719400A

CN1719400A - Self programming parallel linear feedback shift register-AP2LFSR

Info

Publication number: CN1719400A
Application number: CN 200510090589
Authority: CN
Inventors: 罗静远
Original assignee: Individual
Current assignee: Individual
Priority date: 2005-08-19
Filing date: 2005-08-19
Publication date: 2006-01-11

Abstract

The present invention discloses a parallel-processing computer system with self-programming power. Its implementation example is CRC ( cyclic redundancy check). It is an implementation method capable of parameterizing divisor equation of division and using it as input variable quantity of IP function module, utilizes state transition matrix to linearly convert vector and further implements parallel-calcutation of quotient equation and remainder equation and parallel width controllable algorithm. The generation of conversion matrix utilizes a special self-progamming implementation method.

Description

From programming parallel LFSR----AP2LFSR

Technical field

The present invention relates to the logical signal process field, more particularly, the present invention relates to logical signal is done linear transformation and finished the function of parallel divider.To reach encryption and decryption, EDC error detect correction, the method and apparatus of purposes such as data compression.As CRC, m sequence, scramble, BIST data compression etc.Especially, for succinctly disclosing substantial technical essential clearly, this realizes with the Cyclic Redundancy Check being that (essence of CRC coding is exactly that information code current to input is divided by with generator polynomial to example, and the residue of trying to achieve is attached to the coding method that forms check bit after the information bit; So example is without loss of generality), make those skilled in the art can understand this technical scheme, and utilize this scheme to solve the technical matters that is proposed.But this reference does not mean that restriction of the present invention.

Background technology

Along with the arrival of information age and the develop rapidly of microelectric technique, nowadays Cyclic Redundancy Check has been widely used in Computer Storage, in arithmetic system and all kinds of digital communication system.And become CCT to necessary part in all kinds of circuit transmission suggestions.(VLSI ULSI) in the design, to improve the yield rate of integrated circuit (IC) chip, reduces chip cost also often to be applied to VLSI (very large scale integrated circuit).Actual algorithm (hardware that comprises C language and Verilog is realized) is common in the paper and the monograph of association area, and can freely obtain on internet, repeats no more herein.(as http://www.acius.com/ACIDOC/CMU/CMU79909.HTMhttp: //utopia.knoware.nl/users/eprebel/Communication/CRC etc.. all articles here insert and are used for reference .) the CRC generative circuit equivalence fully of Parallel CRC generative circuit and serial, and to carry out the simulating, verifying of the ECRC of Data Link Layer among the PCI-Express by C, its result is in full accord.

The shortcoming of prior art in the patent of invention of the Xu Zhanqi of Xian Electronics Science and Technology University " a kind of hyperchannel multi-bit parallel calculates the method for a CRC sign indicating number " literary composition the 7th page be described.Especially, the G Metz shellfish lattice of Ericsson Telefon AB L M are about how reducing problem redundant in the Parallel CRC generative circuit.And Xu Zhanqi " a kind of hyperchannel multi-bit parallel calculates the method for CRC sign indicating number ", its derivation is loaded down with trivial details, is difficult to specific implementation, and practicality is not strong.In a word, previous scholar, engineers does not all have enough Fundamentals of Mathematics about the research of this problem, and the root of thought is not from basic mathematical principle, and always attempts to release general situation from special example through certain operational skill.It is exactly extremely unscientific being made in like this on the methodology, and inventor itself has also expended a large amount of energy and has little effect.Therefore, previous research about this proposition generally lacks enough systematicness and regular.

Compare the beneficial effect that the present invention obtains with background technology: in performance, the efficient aspect, Parallel CRC counting circuit hypervelocity of the present invention and controlled, cost aspect use hardware logic to finish automatically fully, without any software overhead.Described method is general, its versatility is, any generator polynomial that it can be selected with the user is an input parameter, generate Parallel CRC generative circuit at a high speed by computing (without any the software overhead) programming of chip oneself in inside, and the width user of concurrent operation is controlled, and is compromise to obtain optimum speed and area.

Summary of the invention

Example of the present invention is the generation of CRC (cyclic redundancy check (CRC)).CRC is used for guaranteeing that data transmit reliably in various transmission systems.Transmit leg will be to be transmitted data carry out CRC and calculate check bit, and send together after check bit is attached to data to be transmitted.Reciever carries out CRC with same method to the data of accepting and calculates.Illustrate that then the data transmission is errorless if the value that calculates is consistent with the CRC sign indicating number that receives, otherwise explanation goes wrong.It is existing technology that serial or parallel generates the CRC sign indicating number, and circuit also is known, and existing document also had certain research.Yet up to now, also do not find a kind of can be with the generator polynomial parametrization, as the implementation method of the input variable of IP functional module.Also do not have and utilize state-transition matrix that vector is carried out linear transformation and then realizes parallel and and the controlled algorithm of line width.Further, the generation of transformation matrix does not more have the method that realizes from programming.

State-transition matrix ask method: tradition realizes that parallel way is to carry out loaded down with trivial details iterative computation in advance (need do calculating respectively to given generator polynomial and parallel width.This means no matter be given generator polynomial or parallel width, parameter of every change just needs whole iterative computation again.Will be that expense is huge for finishing the prior preliminary work of parallel computation like this), and I am by the specific circuit logic in chip internal design, what make the parallel transformation matrix finds the solution that own computing draws in sheet, without any need for preliminary work.Traditional way portability and dirigibility just seem very poor; The more important thing is, if system requirements polynomial expression or parallel width need to make corresponding change with the variation of using, (this demand has been necessary in some real-time systems), it is impossible that then traditional way has become.At this moment just need a kind ofly can be implemented in the logic of programming certainly in the sheet, the variation of parameter is calculated in real time, fast and flexible ground generates the parallel transformation matrix.), it is exactly earlier to the polynomial expression selected iteration length by length that tradition realizes parallel, is exactly the computing (promptly being will walk abreast several with regard to involution several times) of doing involution from state-transition matrix.And I directly obtain this matrix by look-up table.And the structure of table is by a kind of ad hoc fashion of design, utilizes logical circuit to realize.Embodiment can be referring to hereinafter.

Mathematical principle

With (n, k) (be that code length is n, data bit is long to be k, check bit is long to be (n-k)) the CRC sign indicating number, generator polynomial be G (n-k) * (n-k)+G ((n-k)-1) * ((n-k)-1)+G ((n-k)-2) * ((n-k)-2)+... + G (1) * (1)+1 is an example: (length equals the number of shift register among the LFSR with an initial value, that is (n-k)) through feedback shift n time, draw one (n-k) row n column matrix.Continue feedback shift, loop back leftmost (n-k) * (n-k) unit matrix.Remove this (n-k) * (n-k) unit matrix now, be left (the n-k) * k in the right, i.e. the matrix of (n-k) row k row, the matrix of this (n-k) row k row promptly have been transformation matrixs that we want.This transformation matrix has been arranged, just can import data (capable 1 row of k), and then obtained and the corresponding check bit of this data bit (capable 1 row of n-k) with this (n-k) * k matrix premultiplication.Just finished coding after this n-k check bit is attached to k bit data position.The tradition iteration way, from mathematics, be exactly in fact in advance by hand (or software) ask this (n-k) * k transformation matrix.And I generate vector by inputing to the specific state matrix of LFSR, just oneself have calculated this transformation matrix through feedback shift in sheet.

More generally, can realize by the submatrix of suitably choosing this transformation matrix the also control of line width.This is based on following this simple mathematical principle: removed formula=except that formula * merchant formula+residue;

If data to be encoded (promptly by remove formula) be A (n-k) * (n-k)+A ((n-k)-1) * ((n-k)-1)+A ((n-k)-2) * ((n-k)-2)+... + A1x+1, and line width is N:

By remove formula as requested and line width N from low level to high-order segmentation, be divided into section into [M/N].Make Y=xN, then formerly can be expressed as the form that the power of Y launches, and the coefficient of every expansion is one to be not more than N time polynomial expression by the formula of removing: [A (([M/N])+... + A ([M/N]-(N-1))] Y ([M/N])+... + [A ((2N-1)+... + A (N)] Y+[A ((N-1)+... + A1+1];

So just the segmentation division can have been carried out, then utilize the submatrix right side of the capable k row of k of the transformation matrix of above-mentioned (n-k) row k row of trying to achieve to take advantage of input data vector in every section section, by the state transitions N step (i.e. Bing Hang width) is finished the parallel calculating of asking merchant's formula and residue each time: merchant's formula of the highest power section of Y is as merchant result's most significant digit output, the quilt that the residue of the highest power section of Y is added to the inferior high power section of Y remove formula as the quilt of new inferior high power section except that formula, continuation removes the conversion submatrix that formula is carried out the capable k row of premultiplication k to the quilt of this new inferior high power section, and then draws the merchant's formula and the residue of the inferior high power section of Y.Merchant's formula wherein is as merchant result's time high-order output, and the quilt that residue then is added to the lower first power section of Y removes formula and removes formula as the quilt of the lower first power section of new Y; So circulation is obtained up to whole merchant's formulas repeatedly, the just final residue of residue of this moment.Conversion submatrix wherein ask method different with parallel width, mostly wide be what then are exactly in the transformation matrix of above-mentioned (n-k) row k row of trying to achieve, with k is the k rank square formation that window width slides to the right several and obtains: when parallel width is 1, slide 1 to the right, 2 o'clock then 2 ... ..

Embodiment

One mode select signal can be set, and chip is set to programming state when start powers on, to finish initialized action; Just can change running status over to afterwards, parallel divider is started working:

Programming state

This state is finished some initial setting up and transformation matrix is found the solution.(accompanying drawing 2 examples a kind of possible circuit realize, but be not limited thereto implementation.This realizes only doing reference as a kind of possible scheme)

At first one group of n trigger is set and puts down in writing the generator polynomial of input (as 32 D Flip-Flop at the parameter input port of module (can be the chip of IP functional module or standalone); Because say with present should being used for, mostly be the polynomial expression below 32 times, mostly be 64 times and few application the to the utmost most.So can not produce very big hardware spending).Having or not of XOR gate (feedback tap) by the input multinomial coefficient information Control LFSR that puts down in writing in the trigger.So just can do generator polynomial with any input polynomial expression.

Then, mode select signal being set makes module enter programming state.At this moment, parallel insert the non-zero arbitrary sequence (if finally expect the CRC sign indicating number of systematic code, then import MSB and be 1 and other are 0 sequence) to LFSR (can be understood as and put initial value).Then, carry out the operation of n displacement/feedback according to timeticks.Each timeticks is all counted down the full content of LFSR, delegation as transformation matrix, behind n clock, just obtained the matrix of whole total transform, for reducing expense, hardware realizes adopting the mode of SRAM: each clock writes the content of displacement/feedback back LFSR the address location of SRAM.Like this with regard on generated look-up table mentioned above.The core of vector multiplication when this look-up table will be the running status coding/decoding.

With (7,4) (be that data bit length is 4, check bit length is 3, and code length is 7) CRC sign indicating number, generator polynomial is that x3+x+1 is an example, please refer to accompanying drawing 1:

With initial value is 100 (left side is MSB) through feedback shift seven times, draws 3 * 7 matrixes of accompanying drawing 1.Continue feedback shift, loop back leftmost 3 * 3 unit matrix.Remove this 3 * 3 unit matrix now, be left this matrix of 3 * 4 of the right, the matrix of this 3 row 4 row promptly has been a transformation matrix that we want.This transformation matrix of 3 * 4 has been arranged, just can import data (4 row, 1 row) with this 3 * 4 matrix premultiplication, and then obtain and the corresponding check bit of this data bit (3 row, 1 row): e.g. input data are 1001, and then check results is 011, and coding result is exactly 1001011:

The tradition iteration way, from mathematics, be exactly in fact in advance by hand (or software) ask this transformation matrix.And I generate vector (this example is 100) by inputing to the specific state matrix of LFSR, just oneself have calculated this transformation matrix through feedback shift in sheet.

Running status

This state is finished the coding/decoding to the input data.

So-called concurrent operation is exactly a matrix multiplication operation: act on vector (data to be encoded stream) with transformation matrix, obtain the vector transformation that serial-shift needs multistep to realize once going on foot.And its essence of the vector multiplication of Boolean field is exactly dot product, nothing more than the combination that is logical multiply and two kinds of computings of logical add.Can utilize XOR gate (mould 2 adds) on the hardware fully and realize with door (modular two multiplication). So just can come realization matrix to multiply by vectorial computing with searching list-directed and door/XOR gate array.Come dynamic gating matrix linear operation by the SRAM internal information, and in conjunction with the method (Programmable Dynamic is connected) of finishing dot product with-XOR gate array.

I am that example is further explained (this example just as a kind of possible implementation, only for reference) with basic and a door/XOR gate array element, please refer to accompanying drawing 3.At first, our transformation matrix (being 3 * 7 matrixes of accompanying drawing 1 in this example) of trying to achieve deposits in the sram cell.The line control signal that these information will connect as Programmable Dynamic.So-called able to programme just be meant with array able to programme, but the XOR array be fix (this is similar to a bit: the form of PLA).Each of SRAM is as control of input end and gate array in two inputs and the door, and another input end is connected to vector to be transformed each: so promptly finished in the vector multiplication of Boolean field each component logic to the computing of taking advantage of; Each is coupled to XOR gate (promptly being connected to each input end of many input XOR gate together) with the output of door then, has so just finished component logic in the vector multiplication of Boolean field to the computing of taking advantage of back mould 2 to add.The output of this XOR gate is row (or delegation of column vector to be transformed) of row vector to be transformed.So far, vector multiplication is able to realize with logic gates.The check bit (that is residue of polynomial division computing) of input data has just been calculated.

Sum up

Therefore, the present invention performs well in realizing above-mentioned purpose and obtains described function and advantage, with and other inherent beneficial effects.Although narrated and illustrated the present invention and set forth the present invention with reference to the concrete preferred embodiment of invention, this reference does not mean that restriction of the present invention, and can't release this restriction.For the those of ordinary skill of correlative technology field, the present invention can consider to carry out various distortion on form or function, improves, and revises and be equal to substitute and utilize this scheme to solve the technical matters that is proposed.The present invention's explanation and the preferred embodiment of describing only are used for example, and are not used in limited range.Therefore, the present invention only limits by the essence and the scope of appended claim, in all fields equivalent is provided comprehensive understanding.Though detailed description of the present invention direct representation is specific exemplary embodiment, for this professional those of ordinary skill, suggestion can be made different modifications and replacement to this embodiment.The present invention comprises any modification or the replacement that falls in the claim scope.

Inventor email:luo_jingyuan@hotmail.com

Reference book

1. ten thousand wise men earlier: " algebraic sum coding ", Science Press, 1976

2. Chen Jing moistens: " Elementary Number Theory ", Science Press, 1978

3. Wang Xin plum, Xiao Guo town: " error correcting code---principle and method (revised edition) ", publishing house of Xian Electronics Science and Technology University, 2001

Claims

A kind of computer system that has from the parallel processing of the ability of programming, described computer system comprises:
1. the parallel algorithm of a programmable divider (its result has comprised merchant and residue simultaneously) is characterized in that concurrent operation utilizes state-transition matrix to do linear transformation and finishes.
2. divider as claimed in claim 1 is characterized in that the width that walks abreast is controlled, that is adopts the parallel algorithm of segmentation.
3. divider as claimed in claim 1 is characterized in that transformation matrix realized from the programming ability by LFSR, and without any need for early stage calculate and software overhead.
4. divider as claimed in claim 1 is characterized in that the formula of removing is able to programme, promptly removes the input parameter that formula can be used as module (can be the chip of IP functional module or standalone).The having of its feedback tap the determining of storer control of having no way of with gate array.
5. one kind writes SRAM with characteristic parameter, comes dynamic gating matrix linear operation by the SRAM internal information, and in conjunction with the method (Programmable Dynamic is connected) of finishing dot product with-XOR gate array, said method comprising the steps of

A) value with matrix (or its piecemeal submatrix) writes sram cell with behavior unit's (also be listed as can);

B) each of SRAM is controlled and gate array as two inputs and an input end in the door, and another input end is connected to vector to be transformed each;

C) vector to be transformed each through step b with behind the door, be coupled to XOR gate (promptly being connected to each input end of many input XOR gate together) together, the output of this XOR gate is row (or delegation of column vector to be transformed) of row vector to be transformed.
6. computer system as claimed in claim 3, wherein the realization from the ability of programming is SRAM., but is not limited to certain specific storage medium.