CN101599302B

CN101599302B - High efficiency storing method for coding digit of LDPC coder based on FPGA

Info

Publication number: CN101599302B
Application number: CN 200910089662
Authority: CN
Inventors: 谢天娇; 王菊花; 宋颖; 杨新权; 李立
Original assignee: Xian Institute of Space Radio Technology
Current assignee: Xian Institute of Space Radio Technology
Priority date: 2009-07-23
Filing date: 2009-07-23
Publication date: 2012-05-09
Anticipated expiration: 2029-07-23
Also published as: CN101599302A

Abstract

The invention discloses a high efficiency storing method for the coding digit of an LDPC coder based on an FPGA. In the method, the coding digit and external information (or channel information) share one storage block, thereby effectively reducing the requirement of the coder to the quantity of the storage resources; and the coding digit and the external information can be withdrawn when reading the storage block. Therefore, a check equation computational unit PCU and a check node update unit CNU can share one group of address information, and the PCU does not need additional address generation units; and finally, the design method of a step thinning production line is adopted to realize process units VNU, CNU and PCU, thereby effectively reducing the key path delay of the coder and providing a necessary safeguard to improve the throughput of the LDPC coder. When realizing the method with the FPGA, the invention not only can save a large number of the resources for respectively storing the digit and the logic resources for generating addresses which are needed by the PCU, but also can improve the throughput of the coders.

Description

A kind of high-efficiency storage method of decoding code word of the ldpc decoder based on FPGA

Technical field

The present invention relates to a kind of storage means of decoding code word of ldpc decoder; Specifically will decipher the method for the shared same storage block of code word and external information (or channel information); Be that a kind of FPGA resource occupation is few, the high effective storage solution of decoding handling capacity.

Background technology

In modern times in the digital communication system, for guarantee various data can be reliably, transmission effectively, often to utilize error correction coding.In recent years, along with the development of digital communication and the appearance of various high speed data transmission service, study and utilize error correction coding just to seem more and more important.Theoretical research shows that the LDPC sign indicating number is the error correction coding of present known preferred show, and result of study shows, code check is 1/2, code length is 10 ⁷Irregular LDPC codes on awgn channel; Its performance is apart from the only poor 0.04dB of capacity limit; Be up to now near the error correcting code of Shannon limit, Gallager is when early sixties invention in last century LDPC sign indicating number, and the restriction of level of hardware is not applied owing to receive at that time; Along with the development of large scale integrated circuit technology, the LDPC sign indicating number has got into practical development stages from theoretical research.Though the LDPC sign indicating number has linear decoding complexity, when adopting FPGA that ldpc decoder is realized, not only need take a large amount of FPGA resources but also the decoding handling capacity is low, can not satisfy the demand of high speed data transfer.

In present background technology, a large amount of documents about ldpc code decoder is arranged, but the storage of decoding code word adopts independent block RAM (to represent document: K.Shimizu; Etc., " Partially-parallelLDPC decoder based on high-efficiency message-passing algorithm, " in Proc.Int.Conf.on Computer Design (ICCD); Oct.2005, pp.503-510.), distributed RAM (represents document: Y.Chen and D.Hocevar; " A FPGA and ASIC implementation of rate 1/2,8088-b irregular low density parity check decoder, " in Proc.IEEE GLOBECOM; 2003; Pp.113-117) or register memory (represent document: Zhongfeng Wang, ZhiqiangCui, " Low-Complexity High-Speed Decoder Design for Quasi-Cyclic LDPCCodes; " IEEE trans.On VLSI systems; Vol.15, no.1, Jan.2007.pp.104-114).Register memory and distributed RAM all will take the logical resource of fpga chip, i.e. slices resource.

Summary of the invention

Technology of the present invention is dealt with problems and is: the deficiency that overcomes existing ldpc decoder internal information memory technology; Proposed a kind of storage means that improves the utilization ratio of storage resources of ldpc decoder, the present invention not only can save a large amount of resources and the logical resource that produces the PCU required address of storage decoding code word separately of being used for.

The further technology of the present invention is dealt with problems and is: the handling capacity that has improved code translator through the method for designing that adopts inner thinning production line.

Technical solution one of the present invention is: a kind of high-efficiency storage method of decoding code word of the ldpc decoder based on FPGA; Described ldpc decoder structure comprises that the storage bit wide is storage block RAM_f and the RAM_m of (Q+1) bits; Variable node processing unit VNU; Code check node processing unit CNU, check equations computing unit PCU; Method step is following:

(1) initialization step: ldpc decoder is to store among the RAM_f after the channel information of Q bits expands to (Q+1) bits with the bit wide that receives, RAM_m is initialized as complete zero, initialization iterations iter=0, maximum iteration time iter=MAX_ITER;

(2) step of updating of variable node:

A) from storage block RAM_f and RAM_m, read data A and the B of (Q+1) bits respectively;

B) data A and B are split as the data of Q bits and the data of 1bit respectively; Wherein, data A splits into the channel information A1 of Q bits and the codeword information A0 of 1bit; Data B splits into the external information B1 of Q bits and the codeword information B0 of 1bit;

C) channel information A1 and external information B1 are inputed to variable node processing unit VNU; The channel information A1_new that variable node processing unit VNU obtains upgrading after the information of input is handled, the decoding code word C0_new that external information B1_new and this iteration produce;

D) channel information A1_new that upgrades and decoding code word C0_new are merged into the A_new of (Q+1) bits and according to the write back address of sense data A in storage block RAM_f; External information B1_new that upgrades and decoding code word C0_new are merged into the B_new of (Q+1) bits; And with among the write back address storage block RAM_m of B_new according to sense data B, as the new data B that stores among the RAM_m;

(3) step of updating of check-node:

I reads the data B of (Q+1) bits from storage block RAM_m, data B is split into the external information B1 of Q bits and the codeword information B0 of 1bit;

Ii inputs to code check node processing unit CNU with the external information B1 of step I, and codeword information B0 is inputed to check equations computing unit PCU; The Q bits external information B1_new that obtains upgrading after code check node processing unit CNU handles; The external information B1_new of the Q bits that upgrades is expanded to the B_new of (Q+1) bits; And with among the write back address RAM_m of this B_new according to sense data B, as the new data B that stores among the RAM_m; Check equations computing unit PCU carries out obtaining syndrome vector s after the computing to codeword information B0; This iteration finishes, and iterations iter adds 1;

Iii judges whether syndrome vector s is zero or whether iterations reaches MAX_ITER; If s=0, then this decoding code word of upgrading of expression is a code word allowable, changes step I v; If s ≠ 0 and iter=MAX_ITER then stop iteration, decoding failure changes step I v; Otherwise, begin to continue iteration from step (2);

Iv, sense data A_new from storage block RAM_f splits out with the most significant digit of A_new and is the decoding code word that needs output.

Technical solution two of the present invention is: a kind of high-efficiency storage method of decoding code word of the ldpc decoder based on FPGA; Described ldpc decoder structure comprises that the storage bit wide is storage block RAM_f and the RAM_m of (Q+1) bits; Variable node processing unit VNU; Code check node processing unit CNU, check equations computing unit PCU; Method step is following:

(1) initialization step: ldpc decoder is to store among the RAM_f after the channel information of Q bits expands to (Q+1) bits with the bit wide that receives; RAM_m is initialized as above-mentioned (Q+1) bits information according to the non-zero position of ldpc decoder check matrix; Initialization iterations iter=0, maximum iteration time iter=MAX_ITER;

(2) step of updating of check-node:

Ii inputs to code check node processing unit CNU with the external information B1 of step I, and codeword information B0 is inputed to check equations computing unit PCU; The external information B1_new that obtains upgrading after code check node processing unit CNU handles; The external information B1_new of the Q bits that upgrades is expanded to the B_new of (Q+1) bits; And with among the write back address RAM_m of this B_new according to sense data B, as the new data B that stores among the RAM_m; Check equations computing unit PCU carries out obtaining syndrome vector s after the computing; This iteration finishes, and iterations iter adds 1;

Iii judges whether syndrome vector s is zero or whether iterations reaches MAX_ITER; If s=0, then this decoding code word of upgrading of expression is a code word allowable, changes step I v; If s ≠ 0 and iter=MAX_ITER then stop iteration, decoding failure changes step I v; Otherwise, change step (3);

Iv, sense data A from storage block RAM_f splits out with the most significant digit of A and is the decoding code word that needs output;

(3) step of updating of variable node:

D) channel information A1_new that upgrades and decoding code word C0_new are merged into the A_new of (Q+1) bits and according to the write back address of sense data A in storage block RAM_f; External information B1_new that upgrades and decoding code word C0_new are merged into the B_new of (Q+1) bits; And with among the write back address RAM_m of this B_new according to sense data B; As the new data B that stores among the RAM_m, change step (2) and continue to carry out.

Variable node processing unit VNU in above-mentioned two schemes, code check node processing unit CNU, check equations computing unit PCU all adopt the method for step thinning production line to realize.

Wherein, variable node processing unit VNU adopts 5 level production lines to realize, the 1st level production line converts the external information and the channel information true form of input into complement code, and the external information of input is postponed; The 2nd level production line continues external information is postponed, and with external information and channel information summation; The 3rd level streamline postpones with value described, and should and value and described external information ask poor respectively; The 4th level production line carries out cut position to above-mentioned difference to be handled, and continues above-mentioned postponing with value; Data after the 5th level production line is handled cut position convert true form output into by complement code, and the symbol of taking-up and value is exported as codeword information.When need to accomplish in the clock period ask two above data and the time, can cause critical path delay, in order to reduce critical path delay, in the 2nd level production line, insert streamline again.

Described code check node processing unit CNU adopts 5 level production lines to realize; The 1st level production line is used for isolating the sign bit and the absolute value of the external information of input; The 2nd level production line ask input external information sign bit with and described sign bit is postponed; And obtain the minimum value and the sub-minimum of the isolated absolute value of the 1st level production line, and described absolute value is postponed; The 3rd level streamline continued the postponing respectively with value, minimum value and sub-minimum of above-mentioned sign bit, absolute value, sign bit, and obtain minimum value, sub-minimum respectively with normalization modifying factor alpha multiplied result; The 4th level production line to all sign bits of 3rd level streamline output respectively with sign bit with the value summation, and all absolute values of 3rd level streamline output are compared and select to export with minimum value; The 5th level production line merges back output with all outputs of the 4th level production line.All absolute values of 3rd level streamline output are compared with minimum value described the 4th level production line and selection course is: when the absolute value of exporting when the 3rd level streamline is minimum value; Output sub-minimum and normalization modifying factor alpha's is long-pending; Otherwise, be output as the long-pending of minimum value and normalization modifying factor alpha.

Described check equations computing unit PCU adopts 1 level production line to realize; The codeword information of input asks XOR to obtain syndrome vector s in this streamline,, obtains and V all elements addition among the syndrome vector s through delay; When V=0, syndrome vector s is zero.

The present invention compared with prior art beneficial effect is:

(1) the present invention proposes a kind of storage means that improves the utilization ratio of storage resources of ldpc decoder; This method does not only need extra block RAM storage block to store the decoding code word; And do not use the slices resource to store the decoding code word, reduced the demand of encoder/decoder system effectively to storage resources quantity.Decipher the shared same storage block of code word and external information (or channel information) in the method; Reduced the demand of encoder/decoder system effectively to storage resources quantity; And after adopting this storage mode, can take out decoding code word and external information when reading storage block.So check equations computing unit PCU can share same group address message with check-node updating block CNU, PCU does not need extra address generator unit.In addition, method of the present invention, for full parallel decoder, it is crowded to improve route effectively, and can save the address resource of access register.For part parallel code translator and serial decoding device, the number that can reduce storage block is perhaps saved the logical resource of FPGA.

(2) processing unit VNU of the present invention, CNU, PCU all adopt the method for designing of step thinning production line to realize, have reduced the critical path delay of code translator effectively, for the handling capacity that improves ldpc decoder provides necessary guarantee.When the method that proposes for this paper adopts FPGA to realize; Not only can save a large amount of resources and the logical resource that produces the PCU required address of storage decoding code word separately of being used for, and improve the handling capacity of code translator through the method for designing that adopts inner thinning production line.

(3) the present invention adopts the method for merging/fractionation bit to come to obtain respectively three kinds of information of LDPC iterative decoding process: channel information; External information and decoding codeword information; Decoding code word and the shared same storage block of external information (or channel information) have reduced the demand of encoder/decoder system to storage resources quantity effectively.

(4) also proposed the structure of the few and realization PCU at a high speed of resource occupation, this structure need not adopt register resources to store the syndrome vector s of long bit wide, need not grow the comparison (causing very big delay) of bit wide data and null vector yet.

Description of drawings

Fig. 1 is the ldpc decoder structure of the high-efficiency storage method of the present invention's proposition;

Fig. 2 is the distribution plan of the check matrix nonzero element of LDPC (8176,7154);

Fig. 3 performance minimum for normalization and that quantize to decipher with long-pending decoding algorithm compares;

Fig. 4 is the pipeline organization of 5-input VNU of the present invention;

Fig. 5 is the pipeline organization of 32-input CNU of the present invention;

Fig. 6 asks the minimum value of 32 input data and the pipeline organization of sub-minimum for the present invention;

Fig. 7 is the implementation structure of PCU of the present invention.

Embodiment

As shown in Figure 1; The high-efficiency storage method of the decoding code word of a kind of ldpc decoder based on FPGA of the present invention; Described ldpc decoder structure comprises that the storage bit wide is storage block RAM_f and the RAM_m of (Q+1) bits; Variable node processing unit VNU, code check node processing unit CNU, check equations computing unit PCU; Method step is following:

(1) initialization: code translator with the bit wide that receives be the channel information of Q bits expand to (Q+1) bits (through merge cells with a high position mend ' 0 ' or mend ' 1 ') store among the RAM_f; RAM_m is initialized as complete zero; Initialization iterations iter=0, maximum iteration time iter=MAX_ITER;

(2) renewal of variable node: a) from storage block RAM_f and RAM_m, read data A and the B of (Q+1) bits respectively; B) data A and B are split as data (the low Q position of A and B of Q bits respectively; Represent with A1 and B1 respectively) and data (A and the B most significant digit of 1bit; Represent with A0 and B0 respectively); A1 and B1 just are respectively channel information and external information in the storage organization of this patent, and what stored A0 and B0 the inside is codeword information; C) with the input as VNU of channel information and external information, VNU carries out the channel information A1_new that obtains upgrading after the computing, the decoding code word C0_new that external information B1_new and this iteration produce to input.D) A_new that A1_new and the C0_new of Q bits is merged into (Q+1) bits and according among the write back address RAM_f that reads A; The B1_new of Q bits and C0_new are merged into the B_new of (Q+1) bits; And with among the write back address RAM_m of this B_new according to sense data B, as the new data B that stores among the RAM_m.

(3) renewal of check-node: a) from storage block RAM_m, read the data B of (Q+1) bits, the data of from RAM_m, taking out are the data of the capable position of LDPC code check matrix non-zero; B) B is split as the data (the low Q position of B is represented with B1) of Q bits and the data (the B most significant digit is represented with B0) of 1bit, visible B0 is a codeword information, and B1 is external information; C) with the input of external information B1 as CNU; CNU carries out the external information B1_new that obtains upgrading after the computing; And the external information B1_new of the Q bits that upgrades expanded among the write back address RAM_m of B_new (high-order mend ' 0 ' or mend ' 1 ') according to sense data B of (Q+1) bits, as the new data B that stores among the RAM_m.With the input of codeword information B0 as PCU, PCU carries out obtaining syndrome vector s after the computing simultaneously.After accomplishing (3), iterations iter adds 1;

(4) if syndrome vector s is zero, then this decoding code word of upgrading of expression is a code word allowable, can wait for output, perhaps, if iter=MAX_ITER then stops iteration, shows that decoding failure indicates, and waits for that exporting these has corrected some wrong code words.Otherwise, return (2) and continue iteration.When exporting final decoding code word,, the most significant digit of A split out be the decoding code word that needs output promptly from RAM_f sense data A successively.

When in (1) initialization step storage block RAM_m being initialized as among the RAM_f information, need carry out check-node earlier and upgrade, carry out the renewal of variable node then.

VNU in the method (variable node processing unit); CNU (code check node processing unit) and PCU (check equations computing unit) all adopt the method for designing of step thinning production line to realize; Reduced the critical path delay of code translator effectively, for the handling capacity that improves ldpc decoder provides necessary guarantee.

About the normalization modifying factor in the inventive method; The quantizing bit number Q of decode procedure lastest imformation, and maximum iteration time MAX_ITER is under the situation that the decoder for decoding algorithm is confirmed, to those skilled in the art; Adopt the coding/decoding system method of emulation; Just can obtain these parameters,, no longer be elaborated here for well known to a person skilled in the art general knowledge.

Specify the present invention with LDPC (8176,7154) for example below.

Like Fig. 2, LDPC (8176,7154) is a QC-LDPC sign indicating number (being quasi cyclic shift LDPC code), and its check matrix is to be made up of the circular matrix of 2 * 16 L * L (L=511), and this check matrix size is M * N=1022 * 8176, A _{I, j}Be one 511 * 511 circular matrix, A _{I, j}Each row and column in two nonzero elements, i.e. A are all arranged _{I, j}Row heavily be 2, so the row of this check matrix heavily is 2 * 16=32, column weight is 2 * 2=4.The check matrix structure is following:

[\begin{matrix} A_{1,1} & A_{1,2} & . . . & A_{1,14} & A_{1,15} & A_{1,16} \\ A_{2,1} & A_{2,2} & . . . & A_{2,14} & A_{2,15} & A_{2,16} \end{matrix}]

What the decoding of LDPC sign indicating number was adopted usually is sum-product algorithm, but nonlinear operation wherein is complicated, needs to adopt look-up table configuration, on hardware, must store these forms with ROM like this.Minimum-sum algorithm has been avoided the appearance of look-up table; Reduced taking of hardware store resource, and the normalization minimum-sum algorithm after improving through normalization can be very near the performance of sum-product algorithm (can be with reference to J.H.Chen etc., " Reduced-complexity decoding of LDPC codes; " IEEE Trans.on Commun.vol.53; No.8, Aug.2005, pp.1288-1299.).And normalization minimum-sum algorithm inside mainly compares computing; So do not need channel estimating; And the deviation of channel estimating can be to causing the loss on the performance with long-pending decoding algorithm; And channel estimating also need take a large amount of FPGA resources and realize, so adopt the decoding algorithm of normalization minimum-sum algorithm in the present embodiment, need confirm its normalization modifying factor for the normalization minimum-sum algorithm.Before carrying out design of encoder, need to confirm the normalization modifying factor, the quantizing bit number Q of decode procedure lastest imformation, and maximum iteration time MAX_ITER.Performance to LDPC (8176,7154) coding and decoding has been carried out coding/decoding system emulation, can obtain curve map as shown in Figure 3.Can see that from the simulation curve of Fig. 3 for deciphering with long-pending, iteration 10 times and 50 decoding performances of iteration only lose less than 0.2dB, when Project Realization, adopt 10 iteration just enough; Select that the normalization of alpha=0.75 (alpha equals 1/ α of formula in the J.H.Chen article (9) among the present invention) is minimum to quantize with decoding algorithm with to the data in the code translator that (external information of channel information and intermediate transfer all adopts the 7bits quantification; The decimal place of these two kinds of information all accounts for 2bits) with compare with long-pending decoding algorithm; Decoding performance loses less than 0.1dB with respect to the sum-product algorithm of same iterations, approximately loses 0.05dB.Finally, it is 0.75 normalization minimum and decoding algorithm that the present invention has adopted the normalization modifying factor, quantizing bit number Q=7, maximum iteration time MAX_ITER=10.

QC-LDPC (8176,7154) code translator hereto, need 16 RAM_f storage series (RAM_f1 ..., RAM_f16) serial with 16 * 2 * 2=64 RAM_m storage, the bit wide of each storer is (Q+1) bits, i.e. 8bits.16 variable node processing units (VNU1 ..., VNU16), 2 code check node processing unit (CNU1, CNU2) with 2 check equations computing units (PCU, PCU2).(4) step that concrete implementation procedure provides according to top summary of the invention.Introduce the VNU (variable node processing unit) in the step (2) (3) below, the method for designing of the step thinning production line of CNU (code check node processing unit) and PCU (check equations computing unit); Said structure belongs to the structure of part parallel code translator.As variable node processing unit VNU, code check node processing unit CNU, when check equations computing unit PCU is 1, be serial decoding device structure; When variable node processing unit VNU is that 8176, code check node processing unit CNU, check equations computing unit PCU are 1022 and are full parallel decoder structure.

Can see that from top LDPC (8176,7154) check matrix structure the row of its check matrix heavily is 32, column weight is 4.Variable node processing unit VNU have 5 inputs (4 external information in1 ..., in4 and 1 channel information f) and 5 output (4 external information out1; ..., out4 and 1 codeword information c), and code check node processing unit CNU has 32 inputs (32 external information in1; ..., in32) with 32 outputs (32 external information out1 ...; Out32); Check equations computing unit PCU have 32 inputs (32 codeword information in1 ..., in32) with 1 output (the vectorial s of syndrome).Decoding algorithm that LDPC adopted is different, VNU just, and the operation processing unit of CNU is different, other all identical, the structure of visible this efficient storage decoding code word of the present invention is suitable for the iterative decoding algorithm of any LDPC sign indicating number.For example, when adopting sum-product algorithm, comprise nonlinear operation in the operation processing unit of CNU, need to adopt the method for look-up table to realize when not adopting the normalization minimum-sum algorithm.Normalization minimum-sum algorithm (please refer to the article of J.H.Chen above-mentioned) for the present invention's employing; VNU mainly is that plus and minus calculation adopts complement representation more convenient; CNU need ask absolute value and the relatively big or small true form that adopts to come computing more convenient; Consider VNU only to handle 4 data and CNU need handle 32 data, in VNU, add the operation that true form converts complement code into.It is thus clear that what store among the RAM_m is the data of true form, again because have only VNU just need be from RAM_f reading of data, so the data of storage complement form among the RAM_f.Above-mentioned conversion specifically is expressed as: the complement code of positive number equals true form, adds 1 and the complement code of negative equals the true form negate.The true form of same positive number equals complement code, and the true form of negative equals the complement code negate and adds 1.

The output of VNU and the relation of input are: outi=sum-ini, i=1,2,3,4, wherein sum for the input 5 numbers with.The pipeline organization that realizes VNU is as shown in Figure 4, and for the critical path delay that reduces VNU need insert 5 level production lines substantially, the dotted line among Fig. 4 is represented streamline; All input and output of VNU all adopt true form to represent, among Fig. 4, f and c represent channel information and codeword information respectively; In1 ..., in4 representes external information; The major function of adder array be obtain 5 the input data and sum, the subtracter array be sum deduct respectively 4 the input extrinsic information data.Lose for fear of precision, the bit wide of intermediate treatment data is wideer than the bit wide of input and output, in order to obtain output data; (the cut position disposal route is asked for an interview document G.Montorsi must to adopt the cut position module that middle data are carried out cut position; S.Benedetto, " Design offixed-point iterative decoders for concatenated codes with interleavers, " IEEEJ.on Selected Areas in Commun.; Vol.19; No.5, May 2001, pp.871-882.).For ask more than two data and the time, in order further to reduce critical path delay, need thinning production line again, promptly in the 2nd level production line, insert streamline again.

The output of CNU and the relation of input are:

Outi = \{\begin{matrix} \underset{i &NotEqual; j}{Π} Sign (Inj) \times Alpha \times Min & Min &NotEqual; | Ini | \\ \underset{i &NotEqual; j}{Π} Sign (Inj) \times Alpha \times Sub_Min & Min = | Ini | \end{matrix}

The symbol of x is asked in sign (x) expression, and alpha is the normalization modifying factor, is a constant, and min representes the minimum value of all input data absolute values, and sub_min representes the sub-minimum of all input data absolute values.

The pipeline organization that realizes CNU is as shown in Figure 5; The structure that can see CNU is made up of two parts up and down, and top mainly is that the symbol to the input data carries out computing, and the lower part mainly is that the absolute value (being numerical values recited) of importing data is carried out computing; CNU has inserted 5 level production lines generally; The 1st level production line is used for isolating the sign bit and the absolute value of input data, that the 2nd level production line is asked 32 sign bits and sign32 (for ask the 1bit data and when hardware is realized, adopt XOR xor operation), reach 32 sign bits postponed; And obtain the minimum value and the sub-minimum of 32 absolute values, and 32 absolute values are postponed; The 3rd level streamline continues 32 sign bits, 32 absolute values, sign32, minimum value and sub-minimums are postponed respectively, and obtain minimum value, sub-minimum respectively with the alpha multiplied result; The 4th level production line is sued for peace respectively to sign32 and 32 sign bits and is obtained 32 outputs; And obtain 32 outputs (if input is a minimum value according to the relation of 32 absolute values and minimum value; Then be output as the long-pending of sub-minimum and alpha, otherwise, be output as the long-pending of minimum value and alpha); The 5th level production line will merge the output that (sign bit is as a high position, and absolute value is as other positions) obtains 32 CNU respectively to the output of 32 sign bit computings and output to 32 absolute value bit arithmetics

In order further to improve processing speed, need thinning production line again, promptly the 2nd grade with the 3rd level streamline in insert streamline again.Second level streamline need be obtained the minimum value and the sub-minimum of 32 nonnegative numbers, has adopted the method for thinning production line more again, and it is divided into the minimum value of asking 4 numbers with 4 level production lines and the method for sub-minimum, and is as shown in Figure 6.Third level streamline, processing be that data and alpha are multiplied each other, the alpha=0.75 that present embodiment is selected, data*0.75=(data/2)+(data/4), promptly data*0.75 is equivalent to move to right two the value addition of one value and data that moves to right with data.In FPGA, just can realize the multiplication of fixed-point number and floating number like this through shift operation and additive operation.Promptly can in third level streamline, insert the overall process speed that two level production lines improve CNU.

The function of PCU is that code word and check matrix are multiplied each other, and produces syndrome vector s, and the length of the syndrome vector s of LDPC (8176,7154) is 1022; Considering the block characteristic of check matrix, can be 1022 the s vector vector that to be blocked into 2 length be L=511 with length, and what ultimate demand was judged is whether vectorial s is zero; Because when all components of a vector were zero, it was only null vector, so; When realizing, can all elements among the vectorial s be added up, obtain V as a result, as shown in Figure 7; Obviously, have only when V=0, syndrome vector s just is zero.This implementation method of Fig. 7 need not adopt register resources to store the syndrome vector s of long bit wide, need not grow the comparison (causing very big delay) of bit wide data and null vector yet.Ask the operation of XOR xor, further thinning production line for 32 numbers among Fig. 7.

For fairness relatively, the present invention adopts and Zhongfeng Wang article (Zhongfeng Wang, Zhiqiang Cui; " Low-Complexity High-Speed Decoder Design for Quasi-CyclicLDPC Codes, " IEEE trans.On VLSI systems, vol.15; No.1; Jan.2007.pp.104-114) same sign indicating number QC-LDPC (8176,7154) sign indicating number in, same FPGA-Xilinx Vertex IIxc2v6000-6, as shown in table 1.The implementation method of this code translator of this paper all has very big advantage than the code translator of Zhongfeng Wang in slices resource and BRAM; The resource of the code translator that this patent is realized can reduce half the many than the code translator of Zhongfeng Wang, and this depends primarily on the storage solution of the decoding code word of this ldpc decoder efficiently that it adopts.When then adopting the slices resource to store the decoding code word in the code translator of Zhongfeng Wang, consumed a large amount of slices resources.What the code translator of Zhongfeng Wang adopted is sum-product algorithm, and nonlinear operation wherein is complicated, needs to adopt look-up table configuration, on hardware, must store these forms with ROM like this, need take a large amount of BRAM resource of FPGA.And this normalization minimum-sum algorithm that the present invention adopts can not consume this part BRAM resource.

The low critical path delay and the high-throughput of code translator of the present invention depend primarily on: a) VNU, CNU, the structure of the few and realization PCU at a high speed of the method for designing of the step thinning production line of PCU and resource occupation.B) the code translator maximum iteration time of present embodiment is 10 times; And the code translator maximum iteration time of Zhongfeng Wang is 15 times; But from Figure 21 of the article of accompanying drawing of the present invention 3 and Zhongfeng Wang, can see; The QC-LDPC of present embodiment (8176,7154) decoding performance is merely about 0.1dB with respect to the loss of the code translator of ZhongfengWang.In practical applications, can ignore.

Table 1Xilinx FPGA XC2V6000-6 realizes statistics

The present invention not detailed description is a technology as well known to those skilled in the art.

Claims

1. high-efficiency storage method based on the decoding code word of the ldpc decoder of FPGA; It is characterized in that: described ldpc decoder structure comprises that the storage bit wide is storage block RAM_f and the RAM_m of (Q+1) bits; Variable node processing unit VNU; Code check node processing unit CNU, check equations computing unit PCU; Method step is following:

(1) initialization step: ldpc decoder is to store among the RAM_f after the channel information of Q bits expands to (Q+1) bits with the bit wide that receives, and RAM_m is initialized as complete zero, and iterations iter is initialized as 0, and maximum iteration time is MAX_ITER;

(2) step of updating of variable node:

(3) step of updating of check-node:

2. high-efficiency storage method based on the decoding code word of the ldpc decoder of FPGA; It is characterized in that: described ldpc decoder structure comprises that the storage bit wide is storage block RAM_f and the RAM_m of (Q+1) bits; Variable node processing unit VNU; Code check node processing unit CNU, check equations computing unit PCU; Method step is following:

(1) initialization step: ldpc decoder is to store among the RAM_f after the channel information of Q bits expands to (Q+1) bits with the bit wide that receives; RAM_m is initialized as above-mentioned (Q+1) bits information according to the non-zero position of ldpc decoder check matrix; Iterations iter is initialized as 0, and maximum iteration time is MAX_ITER;

(2) step of updating of check-node:

(3) step of updating of variable node:

D) channel information A1_new that upgrades and decoding code word C0_new are merged into the A_new of (Q+1) bits and according to the write back address of sense data A in storage block RAM_f; External information B1_new that upgrades and decoding code word C0_new are merged into the B_new of (Q+1) bits; And with among the write back address RAM_m of this B_new according to sense data B; As the new data B that stores among the RAM_m, change step (2) and continue to carry out;

Iv, sense data A from storage block RAM_f splits out with the most significant digit of A and is the decoding code word that needs output.

3. the high-efficiency storage method of the decoding code word of a kind of ldpc decoder based on FPGA according to claim 1 and 2 is characterized in that: described variable node processing unit VNU, code check node processing unit CNU, check equations computing unit PCU all adopt the method for step thinning production line to realize.

4. the high-efficiency storage method of the decoding code word of a kind of ldpc decoder based on FPGA according to claim 3; It is characterized in that: described variable node processing unit VNU adopts 5 level production lines to realize; The 1st level production line converts the external information and the channel information true form of input into complement code, and the external information of input is postponed; The 2nd level production line continues external information is postponed, and with external information and channel information summation; The 3rd level streamline postpones with value described, and should and value and described external information ask poor respectively; The 4th level production line carries out cut position to above-mentioned difference to be handled, and continues above-mentioned postponing with value; Data after the 5th level production line is handled cut position convert true form output into by complement code, and the symbol of taking-up and value is exported as codeword information.

5. the high-efficiency storage method of the decoding code word of a kind of ldpc decoder based on FPGA according to claim 4 is characterized in that: when needs in the clock period accomplish ask two above data and the time, in the 2nd level production line, insert streamline again.

6. the high-efficiency storage method of the decoding code word of a kind of ldpc decoder based on FPGA according to claim 3; It is characterized in that: described code check node processing unit CNU adopts 5 level production lines to realize; The 1st level production line is used for isolating the sign bit and the absolute value of the external information of input; The 2nd level production line ask input external information sign bit with and described sign bit is postponed; And obtain the minimum value and the sub-minimum of the isolated absolute value of the 1st level production line, and described absolute value is postponed; The 3rd level streamline continued the postponing respectively with value, minimum value and sub-minimum of above-mentioned sign bit, absolute value, sign bit, and obtain minimum value, sub-minimum respectively with normalization modifying factor alpha multiplied result; The 4th level production line to all sign bits of 3rd level streamline output respectively with sign bit with the value summation, and all absolute values of 3rd level streamline output are compared and select to export with minimum value; The 5th level production line merges back output with all outputs of the 4th level production line.

7. the high-efficiency storage method of the decoding code word of a kind of ldpc decoder based on FPGA according to claim 6; It is characterized in that described the 4th level production line compares all absolute values of 3rd level streamline output with minimum value and selection course is: when the absolute value of exporting when the 3rd level streamline is minimum value; Output sub-minimum and normalization modifying factor alpha's is long-pending; Otherwise, be output as the long-pending of minimum value and normalization modifying factor alpha.

8. the high-efficiency storage method of the decoding code word of a kind of ldpc decoder based on FPGA according to claim 3; It is characterized in that: described check equations computing unit PCU adopts 1 level production line to realize; The codeword information of input asks XOR to obtain syndrome vector s in this streamline,, obtains and V all elements addition among the syndrome vector s through delay; When V=0, syndrome vector s is zero.