CN103166648B

CN103166648B - A kind of LDPC decoder and its implementation

Info

Publication number: CN103166648B
Application number: CN201110418739.0A
Authority: CN
Inventors: 雷海燕
Original assignee: Leadcore Technology Co Ltd
Current assignee: Leadcore Technology Co Ltd
Priority date: 2011-12-14
Filing date: 2011-12-14
Publication date: 2016-03-30
Anticipated expiration: 2031-12-14
Also published as: CN103166648A

Abstract

The present invention discloses a kind of LDPC decoder and its implementation, this LDPC decoder comprises input buffer cell, state controls and address generating module, code check node processing module, variable node processing module and judgement code word memory cell, this input buffer cell comprises H matrix array elements, V matrix array elements and initial information memory cell, the present invention is by only carrying out the storage of row matrix, column matrix result of calculation with 57 block RAMs, reduce the quantity of the RAM of 50% nearly, reduce memory space required in LDPC decoder implementation procedure; Meanwhile, the present invention only with the arithmetic element of 5 CNU nodes carry out check-node computing, the arithmetic element of 9 VNU nodes carries out variable node computing, decreases arithmetic element required in LDPC decoder implementation procedure, reaches the object of saving resources of chip.

Description

LDPC decoder and implementation method thereof

Technical Field

The present invention relates to an LDPC decoder and an implementation method thereof, and in particular, to an LDPC decoder conforming to a CMMB standard and an implementation method thereof.

Background

In a modern communication system, an error correcting code is an important means for improving the reliability and power utilization rate of channel transmission, and an LDPC code (low density parity check code) is a class of error correcting codes which most approach the shannon limit at present, and is widely applied to the fields of deep space communication, optical fiber communication, satellite digital video, audio broadcasting and the like at present.

In the china mobile multimedia broadcasting digital system (CMMB), RS codes (Reed-solomoncodes ) and LDPC codes are concatenated as error correction codes of channels. The code length is 9216bit, 1/2 and 3/4 code rates are supported, and the data throughput rate is 10 Mbit/s.

The LDPC code in the CMMB system is a regular LDPC code, and the column degrees of an H matrix of the regular LDPC code are all 3. The H matrix row degree of 1/2 code rate is 6, and the H matrix row degree of 3/4 code rate is 12. Further analysis can see that:

the H matrix of the LDPC code in the CMMB system is extended by a basic code word matrix. The expansion can be achieved by two modes, namely a row expansion mode and a column expansion mode.

(1) Line expansion mode:

the H matrix of 1/2 code rate is expanded from the first 18 rows, and the specific expansion method is as follows: any subsequent row m is obtained by shifting j (j ═ m/18 ×)0 ≦ i ≦ 256 cyclically to the right, and j ≦ 256 cyclically. % represents the modulo operation. [ m/18] represents the m divide by 18 rounding operation.

The H matrix of 3/4 code rate is expanded from the first 9 rows, and the specific expansion method is as follows: any subsequent row m is obtained by shifting j (j ═ m/9 × 36) to the right cyclically, corresponding to the i row (i ═ m% 9). % represents the modulo operation. [ m/9] represents the m divide by 9 rounding operation.

(2) Column expansion mode:

the 1/2 code rate H matrix is obtained by expanding a 4608 x 36 matrix. The t column is obtained by downward shifting t 18, t is more than or equal to 0 and less than or equal to 256

The 3/4 rate H matrix is expanded from a 2304 x 36 matrix. The t column is obtained by downward shifting t 9, t is more than or equal to 0 and less than or equal to 256

It can be seen from the CMMB check matrix that the memory can only store the non-zero elements in each sub-matrix and make these elements be called accurately when needed. And the column degree and the row degree of the two code rates are both 3, and the row degree is also a multiple relation, so that the variable node unit can be completely multiplexed, and the check node unit can also be completely multiplexed as long as the scheduling is reasonably arranged.

Generally, a conventional LDPC decoding algorithm includes the following steps:

step one, a BP (belief-propagation) algorithm in a logarithm domain. For ease of algorithm description, the following variables are first defined:

fn is initial log-likelihood ratio information of an initial input of an nth bit;

lmn is log likelihood ratio information transmitted by the mth check node to the nth information node;

zmn is log likelihood ratio information transmitted from the nth information node to the mth check node;

zn is post-log-likelihood ratio information obtained after each iterative operation of the nth bit;

n (m) is a set of all information nodes connected with the mth check node;

n (m) corresponds to the information node set obtained after the information node n in N (m) is removed;

m (n) is a set of all check nodes connected with the nth information node;

m (n) corresponds to the check node set obtained after the check node m in M (n) is removed;

the decoding process of the BP algorithm is as follows:

(1) initialization

For all n and mi ∈ m (n), let Zmin ═ Fn;

(2) and updating the check node. For each check node m and all bit nodes ni connected to it, i.e. ni ∈ n (m), the following processing is performed

{Tmn}_{i} = \underset{n^{'} &Element; N (m) \ ni}{Π} \frac{1 - \exp (z_{{mn}^{'}})}{1 + \exp (z_{{mn}^{'}})}

{Lmn}_{i} = \ln \frac{1 - Tmni}{1 + Tmni}

(3) And updating the bit nodes. For each bit node n and all check nodes mi connected to it, i.e. mi ∈ m (n), the following processing is performed

Z m_{i} n = Fn + \underset{m &Element; M (n) \ mi}{Σ} Lmn

Zn = Fn + \underset{m &Element; M (n)}{Σ} Lmn

(4) Completing one iteration, making hard decision and generating hard decision vectorWherein Zn is greater than 0, thenOn the contrary, the method can be used for carrying out the following steps,and using the check matrix to judge the following formula:

H {\hat{x}}_{n} = 0 \mod 2

if yes, it indicates that the correct decoding result has been obtained. After the decoding is finished, x is a legal code word; if not, ending the iteration process. And (5) repeating the steps 2-4 until the maximum iteration number is reached. If the iteration times exceed the set maximum iteration times and the formula is not satisfied all the time, the decoding is finished, and the decoding failure is declared.

And step two, a normalized minimum sum algorithm.

In the check node update of BP algorithm (2), the following simplification processing is carried out

L_{mn} = α {(- 1)}^{\overset{&OverBar;}{σm &CirclePlus; σmn}} \min_{n^{'} &Element; N (m) \ n} | Z_{{mn}^{'}} |

Wherein Zmn > 0, σ mn ═ 1, otherwise σ mn ═ 0;

alpha is a normalization factor.

Therefore, in the updating process of the check node, the minimum value except the check node is solved for the related data, and then the value is combined with the sign bit to be used as feedback information. The simplified algorithm reduces the high calculation amount brought by exponential and logarithmic function solving.

The operation of multiplying alpha is usually placed in the variable processing unit calculation. Thus, the check node is updated as:

L_{mn} = {(- 1)}^{\overset{&OverBar;}{σm &CirclePlus; σmn}} \min_{n^{'} &Element; N (m) \ n} | Z_{{mn}^{'}} |

the variable nodes are updated as:

Z m_{i} n = Fn + α \underset{m &Element; M (n) \ mi}{Σ} Lmn

however, the conventional LDPC decoding algorithm has the following disadvantages: (1) 108 RAM blocks are needed for carrying out row matrix and column matrix calculation result storage, and the required storage space is large; (2) because the full parallel mode is adopted, the operation units of 18 CNU (check node) nodes and 36 VNU (variable node) nodes are needed, and the used operation units are more.

In summary, it is known that the conventional LDPC decoding algorithm in the prior art has a problem of consuming chip resources due to a large required storage space, and therefore, it is necessary to provide an improved technical solution to solve the problem.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention mainly aims to provide the LDPC decoder and the implementation method thereof, which not only can save the storage space of a chip, but also can meet the performance requirement of a CMMB system and support 1/2 code rates and 3/4 code rates.

To achieve the above and other objects, the present invention provides an LDPC decoder comprising:

the input buffer unit comprises an H matrix array unit, a V matrix array unit and an initial information storage unit, and the input buffer unit receives initialized input data and writes the initialized input data into the H matrix array unit and the initial information storage unit;

the state control and address generation module is used for realizing the control function of the whole decoder and generating address information of the processing stages of the check node processing module and the variable node processing module;

the check node processing module comprises 5 check node processing units, reads the data of the H matrix array unit under the control of the state control and address generation module to complete the updating from all check nodes to variable nodes, and stores the information from the check node to the bit node of the current iteration in the V matrix array unit;

the variable node processing module is connected with the V matrix array unit and the initial information storage unit, comprises 9 variable node processing units and is used for finishing the updating from all variable nodes to check nodes and simultaneously outputting a hard decision result obtained by the current variable node in the current iteration process to the decision code word storage unit; and

and the judgment code word storage unit outputs the decoded information code word.

Further, the LDPC decoder also comprises an LMN register set and an ZMN register set, wherein the LMN register set is used for storing information from the check node to the variable node after the operation of the check node processing module is completed; the ZMN register set is used for storing information from the variable node to the check node after the operation of the variable node processing module is completed.

Furthermore, the LDPC decoder also comprises an index table storage unit which is used for storing indexes with different code rates and outputting the information code words to the judgment code word storage unit after decoding is finished.

Furthermore, 5 check node processing units of the check node processing module work simultaneously, and complete the operation of 18 check nodes through 4 cycles, and complete the update of all check nodes to variable nodes once through 256 × 4 cycles.

Further, the check node processing module substitutes a hard decision result of a corresponding variable node of the last iteration cycle into a check matrix judgment formula to obtain whether corresponding information meets the current check, and when all information meets the check formula, iteration is finished; or when the maximum iteration number is reached, the decoding is finished.

Furthermore, the check node processing module adopts a pipeline technology design, and divides the processing unit into a plurality of stages of pipeline processing.

Further, the check node processing module adopts a 9-stage pipeline structure design.

Furthermore, the 9 variable node processing units of the variable node processing module complete the operations of 36 VNU nodes through 4 cycles, and complete the updating of all variable nodes to check nodes once through 256 × 4 cycles.

Furthermore, the variable node processing module adopts a pipeline technology design, and divides the processing unit into a plurality of stages of pipeline processing.

Further, the variable node processing module adopts a 7-stage pipeline structure design.

Further, the H matrix array unit is composed of 30 RAMs with a data width of 9 bits, the highest bit of which stores the last iteration decoding output, and the lower 8 bits of which are the information from the bit node to the check node of the last iteration.

Further, the V matrix array unit is composed of 27 RAMs with 8-bit data width, and 8 bits store the check node to bit node information of the current iteration.

Further, the decision codeword storage unit is a 9-bit wide, 1024-deep memory.

Furthermore, the index table storage unit is used for storing 9216 × 2 index tables, indexes with 1/2 code rates are stored in 0-9215, indexes with 3/4 code rates are stored in 9216-18431, and after decoding is completed, 9216 code words are rearranged according to the order of the index table storage unit.

Further, the LMN register set includes two sets of register arrays, each set being a 27 × 4 register array; the ZMN register file also contains two sets of 30 x 4 register arrays each.

Further, the LDPC decoder adopts VLSI hardware architecture.

Further, the state control and address generation module controls the LDPC decoder to be scheduled among an idle state, an initialization state, a check node operation state and a variable node operation state.

To achieve the above and other objects, the present invention further provides a method for implementing an LDPC decoder, comprising the steps of:

initializing a system, and writing channel information into an H matrix array unit and an initial information storage unit;

the check node module adopts 5 check node processing units to complete the updating from all check nodes to variable nodes, and stores the information from the check node to the bit node of the current iteration in the V matrix array unit; and

the variable node processing module updates all variable nodes of the 9 variable node processing units to the check nodes, and simultaneously outputs a hard decision result obtained by the current variable node in the current iteration process to the decision codeword storage unit.

Furthermore, the check node module and the variable node processing module adopt a pipeline technology design, and the processing unit is divided into multi-stage pipeline processing to update the check nodes and the variable nodes.

Further, the system is initialized and simultaneously outputs the code word of the last decoding judgment.

Compared with the prior art, the LDPC decoder and the implementation method thereof have the advantages that the calculation results of the row matrix and the column matrix are stored by only using 57 blocks of RAMs, and the quantity of the RAMs is reduced by nearly 50%. The required storage space in the realization process of the LDPC decoder conforming to the CMMB standard is reduced; meanwhile, the invention only uses the operation units of 5 CNU nodes to carry out check node operation and the operation units of 9 VNU nodes to carry out variable node operation, thereby reducing the operation units required in the realization process of the LDPC decoder conforming to the CMMB standard and achieving the purpose of saving chip resources.

Drawings

FIG. 1 is a schematic structural diagram of an LDPC decoder according to the present invention;

FIG. 2 is a diagram illustrating the structure of an LDPC decoder according to a preferred embodiment of the present invention;

FIG. 3 is a system state diagram controlled by the state control and address generation module 12 according to the present invention;

fig. 4 and fig. 5 are schematic diagrams illustrating scheduling of CNU stage system resources at code rates of 1/2 and 3/4, respectively;

FIG. 6 is a schematic diagram of a VNU computing unit according to the present invention;

FIG. 7 is a diagram illustrating an implementation of a CNU unit according to the present invention.

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.

FIG. 1 is a schematic structural diagram of an LDPC decoder according to the present invention. As shown in fig. 1, an LDPC decoder of the present invention at least includes an input buffer unit 11, a state control and address generation module 12, a check node processing module 13, a variable node processing module 14, and a decision codeword storage unit 15.

The input buffer unit 11 comprises an H matrix array unit 16, a V matrix array unit 17 and an initial information storage unit 18, and after receiving initialized input data, the input buffer unit writes the initialized input data into the H matrix array unit and the initial information storage unit; a state control and address generation module 12, configured to implement a control function of the entire decoder and generate address information at a processing stage of the check node processing module and the variable node processing module, that is, control a data flow, and generate an RAM address at a CNU stage and a VNU stage; the check node processing module 13 includes 5 check node processing units, which under the control of the state control and address generation module, obtains the data of the H matrix array unit to complete the update of all check nodes to variable nodes, and stores the information from the check node to the bit node of the current iteration in the V matrix array unit, and meanwhile, the check node processing module 13 substitutes the hard decision result of the corresponding variable node of the previous iteration cycle into the formulaWhether the corresponding information meets the current check is obtained, and when all the information meets the check formula, iteration is finished, or the maximum iteration times is reached, decoding is finished; a variable node processing module 14 connected to the V-matrix array unit and the initial information storage unit and including 9 variable nodesThe processing unit is used for finishing the updating from all variable nodes to the check nodes and simultaneously outputting the hard decision result obtained by the current variable node in the current iteration process to the decision codeword storage unit; and a decision codeword storage unit 15, configured to store the decision codeword determined by the current iteration variable node processing module 14, and output the decoded information codeword.

Preferably, the LDPC decoder of the present invention further comprises an LMN register set 19, an ZMN register set 20 and an index table storage unit 21, wherein the LMN register set 19 is used for storing information from the check node to the variable node after the operation of the check node processing module 13 is completed; ZMN the register set 20 is used to store the information from the variable node to the check node after the operation of the variable node processing module 14. The index table storage unit 21 is configured to store indexes with different code rates, and output an information codeword to the decision codeword storage unit after decoding is completed.

FIG. 2 is a diagram illustrating the structure of an LDPC decoder according to a preferred embodiment of the present invention. The preferred embodiment of the present invention is a VLSI design implementation of LDPC decoder, but the present invention is not limited thereto. The structure of the LDPC decoder of the present invention will be further described by referring to FIG. 2. As shown in fig. 2, the VLSI implementation structure of an LDPC decoder of the present invention is illustrated as follows:

(1) an input buffer unit: the input data is directly initialized and written into the H matrix array 16, namely the RAM _ H array, and the input data is stored in the initial information storage unit 18, namely the RAM _ in0, the RAM _ in1 and the RAM _ in2, so that a VNU (variable node processing unit) stage participates in decoding operation;

(2) check node processing module (CNU): consists of 5 CNU basic arithmetic units. 5 CNUs work simultaneously, and the operation of 18 CNU nodes is completed after 4 cycles. All check node to variable node updates are done once over 256 x 4 cycles. And meanwhile, substituting the hard decision result of the corresponding variable node in the last iteration cycle into a check matrix judgment formula to obtain whether the corresponding information meets the current check. When all the information satisfies the check formula, iteratingAnd ending the generation or ending the decoding when the maximum iteration number is reached. Here, the check matrix has the formulaWhereinIs a hard decision vector;

(3) variable node processing module 14 (VNU): the method is composed of 9 VNU basic operation units, and operations of 36 VNU nodes are completed through 4 cycles. And after 256 times 4 cycles, finishing the updating from all variable nodes to check nodes. Meanwhile, according to the Zn symbol, obtaining a hard decision result of the current variable node in the iteration process;

(4) the state control and address generation module 12: mainly controlling data flow and generating RAM addresses of CNU and VNU stages;

(5) the H-matrix array unit 16 is a single-port RAM _ H array, and is composed of 30 RAMs of 9-bit data width. The most significant bit of RAM _ H stores the last iteration decoding output, and the lower 8 bits are LLR information from the bit node to the check node of the last iteration. The RAM is divided into 5 rows of 6 RAMs each, with the RAMs numbered 0-29. RAM Nos. 0-17 are 1024 depths. Every 256 depths is a bank, for a total of 4 banks. RAM Nos. 18-29 are 768 depths. Every 256 depths are one bank, and the number of the banks is 3;

(6) the V-matrix array unit 17 is a single-port RAM _ V array, and is composed of 27 RAMs with 8-bit data width, and the 8-bit stores LLR information from check node to bit node of the current iteration. The RAM is divided into 9 rows of 3 RAMs each, and the RAMs are numbered 0-26. All RAM is 1024 depths. Every 256 depths are one bank, 4 banks in total, namely bank0, bank1, bank2 and bank 3;

(7) the decision codeword storage unit 15 is a decision codeword RAM, which only stores the decision codeword judged by the current iteration VNU unit, and is a 9-bit wide, 1024-deep memory;

(8) index table storage unit 21: namely ROM _ index, is a ROM that stores 9216 x 2 index tables. 0-9215 stores an index of 1/2 rate, and 9216-18431 stores an index of 3/4 rate. After decoding is finished, rearranging 9216 code words according to the order of rom _ index, and then outputting information code words;

(9) LMN register set 19: the REG _ V register set includes two sets of REG _ V arrays REG0_ V, REG1_ V, each of which is a 27 × 4 register array. Used to store Lmn check node to variable node information for completion of CNU operations. All 27 reg0_ v arrays are full in one row through 4 cycles. Then reg0_ v is shifted to reg1_ v. At the next 4 cycles, the values in reg1_ V0-3 are written into 27 RAMs _ V0-26, respectively;

(10) ZMN register set 20: the REG _ H register set includes two REG _ H arrays REG0_ H, REG1_ H, each of which is a 30 × 4 register array. And the information Zmn is used for storing the variable node to check node information Zmn after VNU operation is completed. All 30 reg0_ v arrays are full in one row through 4 cycles. Then reg0_ h is shifted to reg1_ h. At the next 4 cycles, the values in reg1_ H0-3 are written into 30 RAMs _ H0-26, respectively.

FIG. 3 is a system state diagram controlled by the state control and address generation module 12 according to the present invention. The scheduling of the LDPC decoder of FIG. 2 will be described below in conjunction with FIG. 3. In the preferred embodiment of the present invention, the system states include an idle state, an initialization state, a check node operation state and a variable node operation state, which are respectively described as follows:

(1) IDLE state (IDLE): when the system is reset, the system is in an IDLE state. When the system decoding is finished, the system will return to the IDLE state;

(2) an initialization state: in the system initialization stage, channel information is written into the RAM _ H array, and the RAM in the RAM _ H array is initialized. In this state, the codeword of the last decoding decision can be output at the same time;

(3) check node operation (CNU) stage: i.e. check node update phase. Check node arithmetic unit at this stage(CNU unit) performs calculation. 5 CNU units are calculated simultaneously. In the preferred embodiment of the present invention, a 9-stage pipeline architecture is used, i.e., read RAM _ H address (1), RAM _ H data read (1), CNU operation (2), write to REG0 register bank (4), write REG1 register contents to RAM _ V matrix (1). And finishing updating all check nodes after 1032 cycles. At this stage, the last iteration code word is checked to determine whether it conforms to the check equation. If yes, the decoding is finished. In the CNU phase, each CNU unit (i) reads data from the RAM _ H matrix (J columns) 6 RAMs per cycle (t) to participate in the calculation. Order of reading into RAM: address a starts with 0 and increments by 1 every cycle. After the calculation is completed, the data is written into the RAM _ V matrix array. Sequence of writing to RAM array: address a fromInitially, add 1 every cycle;

(4) a variable node operation (VNU) phase, i.e. a bit node update phase. At this stage the variable node arithmetic unit (VNU unit) performs the calculation. The 9 CNU units are calculated simultaneously, and in the preferred embodiment of the invention, a 7-level pipeline structure is adopted, namely, RAM _ V address reading (1), RAM _ V data reading + CNU operation (1), register bank REG0 writing (4), and register REG1 writing into RAM _ H matrix (1)). After 1030 cycles, all bit node updates are completed. And at the stage, judging the code words simultaneously to obtain the code words of the iteration. In the VNU stage, each VNU unit (J) reads data in 3 RAMs of RAM _ V (J column) every cycle (t) to participate in calculation, and the reading sequence is as follows: address a starts with 0 and increments by 1 every cycle. After the calculation is completed, the calculated data is written into the RAM _ H matrix array. Sequence of writing to RAM array: address a fromInitially, 1 is added for each cycle.

Fig. 4 and 5 are schematic diagrams illustrating scheduling of system resources in CNU stages at code rates of 1/2 and 3/4, respectively, and the following describes the CNU unit scheduling in accordance with the preferred embodiment of the present invention in detail with reference to fig. 4 and 5.

1/2, specific scheduling is as shown in FIG. 4, the 1 st CYCLE is calculated by 0-4, the 2nd CYCLE is calculated by 5-9, the 3 rd CYCLE is calculated by 10-14, and the 4 th CYCLE is calculated by 15-17. All 18 basic unit calculations are completed through 4 CYCLEs. All 4608 check nodes are computed through 256 × 4 CYCLEs.

3/4, specific scheduling is as shown in FIG. 5, the 1 st CYCLE is calculated by 0-1, the 2nd CYCLE is calculated by 2-4, the 3 rd CYCLE is calculated by 5-6, and the 4 th CYCLE is calculated by 7-8. All 9 basic unit calculations are completed through 4 CYCLEs. All 2304 check nodes are calculated over 256 × 4 CYCLEs.

In addition, in the preferred embodiment of the present invention, for VNU unit scheduling, the first operation, CYCLE, is calculated 0-8, the second operation, CYCLE, is calculated 9-17, the third operation, CYCLE, is calculated 18-26, and the second operation, CYCLE, is calculated 27-35. All 36 basic unit calculations are completed through 4 CYCLEs. All 9216 check nodes are completed through 256 × 4 CYCLEs.

FIG. 6 is a schematic diagram of an implementation of a VNU according to the present invention. The following describes an embodiment of the VNU operation unit according to the present invention with reference to fig. 6. As shown in FIG. 6, P1/P2/P3 are data fetched from RAM, with a pipeline structure of 9 VNU units. And storing the hard decision word (1BIT) judged by Fn _ out into an output RAM. 9 BITs are stored once, and all 9216 hard decision words can be stored after 4 × 256 cycles.

FIG. 7 is a diagram illustrating an implementation of a CNU unit according to the present invention. The following describes the implementation of the CNU operation unit of the present invention with reference to fig. 7. In order to reduce resources, the algorithm is analyzed, and only two output values are found, namely the minimum value and the second-smallest value in all input values. When the input data is the global minimum input, the corresponding output is the second smallest value, and the corresponding outputs of other input data are the smallest values. From this, it is understood that all outputs can be obtained by obtaining the minimum value and the second minimum value of all data and the position of the input data corresponding to the minimum value.

Code rate 3/4 takes 12 inputs of data, a 4-level pipeline.

The code rate of 1/2 takes 6 inputs of data, a 3-level pipeline. If the data of the above graphs D2_1, D2_2, a., D2_6 is added with a selector, the data of D2_ n is directly obtained from the input data if the code rate is 1/2.

First-stage flowing water: two-by-two comparison results in the minimum value (placed above D1_1, D1_3,...., D1_11), the next-to-minimum value (placed below blocks D1_2, D1_4,...., D1_12), and the position L1_ 1-11 _6 (bit width 1bit) of the minimum value.

And (3) second-stage flowing water: and D1_1 is compared with D1_3 to obtain the minimum value D2_1, namely the minimum value of Data 1-Data 4. Meanwhile, the position of the minimum value is stored in the register L2_1 (bit width 2 bit). Similarly, the minimum values of Data5 to Data8, the minimum value position L2_2, the minimum values of Data9 to Data12, and the minimum value position L2_3 can be obtained.

The method for solving the next decimal value comprises the following steps: if D1_1 < D1_3, then D1_2 is compared with D1_3, and the smaller of the two is sent to D2_ 2. On the contrary, if D1_1 > D1_3, the magnitudes of D1_1 and D1_4 are compared, and the smaller of the magnitudes is sent to D2_ 2.

Third-stage flowing water: and D2_1 is compared with D2_3 to obtain the minimum value D3_1, namely the minimum value of Data 1-Data 8. Meanwhile, the position of the minimum value is stored in the register L3_1 (bit width 3 bit). The minimum value of Data 9-Data 12 is D2_5, and this is sent directly to D3_ 3. The position of the minimum value is stored in the register L3_2 (bit width 3 bit).

The method for solving the next decimal value comprises the following steps: if D2_1 < D2_3, then D2_2 is compared with D2_3, and the smaller of the two is sent to D3_ 2. On the contrary, if D2_1 > D2_3, the magnitudes of D2_1 and D2_4 are compared, and the smaller of the magnitudes is sent to D3-2.

Fourth-stage flowing water: d3_1 is compared with D3_3 to obtain the minimum value D4_1(MIN), which is the minimum value of Data 1-Data 12. Meanwhile, the position of the minimum value is stored in the register L4_1 (bit width 4 bit).

The method for solving the next decimal value comprises the following steps: if D3_1 < D3_3, then D3_2 is compared with D3_3, and the smaller of the two is sent to D4_ 2. On the contrary, if D3_1 > D3_3, the magnitudes of D3_1 and D3_4 are compared, and the smaller value is sent to D4_2(2nd _ min).

In summary, comparing the VLSI implementation structure of the LDPC decoder of the present invention with the LDPC algorithm of the prior art, the advantages of the present invention are mainly expressed in the following aspects:

1. storage space

The LDPC algorithm in the prior art needs 108 blocks of RAM to store the calculation results of the row matrix and the column matrix. The invention only uses 57 blocks of RAM to store the calculation results of the row matrix and the column matrix. The amount of RAM is reduced by nearly 50%. Therefore, the invention reduces the storage space required in the implementation process of the LDPC decoder conforming to the CMMB standard.

2. Arithmetic unit

The LDPC algorithm in the prior art adopts a full parallel mode, and needs operation units of 18 CNU nodes and operation units of 36 VNU nodes. The invention only uses 5 CNU nodes and 9 VNU nodes. Therefore, the invention reduces the operation units required in the realization process of the LDPC decoder conforming to the CMMB standard.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims

1. An LDPC decoder comprising at least:

a decision code word storage unit for outputting the decoded information code word; wherein,

5 check node processing units of the check node processing module work simultaneously, complete the operation of 18 check nodes through 4 periods, and complete the updating of all check nodes to variable nodes once through 256 × 4 periods;

the 9 variable node processing units of the variable node processing module complete the operation of 36 VNU nodes through 4 cycles, and complete the updating of all variable nodes to check nodes once through 256 × 4 cycles.

2. The LDPC decoder of claim 1, wherein: the LDPC decoder also comprises an LMN register set and an ZMN register set, wherein the LMN register set is used for storing information from the check node to the variable node after the operation of the check node processing module is completed; the ZMN register set is used for storing information from the variable node to the check node after the operation of the variable node processing module is completed.

3. The LDPC decoder of claim 2, wherein: the LDPC decoder also comprises an index table storage unit which is used for storing indexes with different code rates and outputting information code words to the judgment code word storage unit after decoding is finished.

4. The LDPC decoder of claim 1, wherein: the check node processing module substitutes a hard decision result of a corresponding variable node in a last iteration period into a check matrix judgment formula to obtain whether corresponding information meets the current check, and when all information meets the check formula, iteration is finished; or when the maximum iteration number is reached, the decoding is finished.

5. The LDPC decoder of claim 4, wherein: the check node processing module adopts a pipeline technology design, and divides a processing unit into multi-stage pipeline processing.

6. The LDPC decoder of claim 5, wherein: the check node processing module adopts a 9-stage pipeline structure design.

7. The LDPC decoder of claim 1, wherein: the variable node processing module adopts a pipeline technology design, and divides a processing unit into multi-stage pipeline processing.

8. The LDPC decoder of claim 7, wherein: the variable node processing module adopts a 7-stage pipeline structure design.

9. The LDPC decoder of claim 1, wherein: the H matrix array unit is composed of 30 RAMs with 9-bit data width, the highest bit of the H matrix array unit stores the last iteration decoding output, and the lower 8 bits of the H matrix array unit are the information from the bit node to the check node of the last iteration.

10. The LDPC decoder of claim 1, wherein: the V matrix array unit consists of 27 RAMs with 8-bit data width, and 8 bits store the information from check nodes to bit nodes of the iteration.

11. The LDPC decoder of claim 1, wherein: the decision codeword storage unit is a 9-bit wide, 1024-deep memory.

12. The LDPC decoder of claim 3, wherein: the index table storage unit is used for storing 9216 x 2 index tables, indexes with 1/2 code rates are stored in 0-9215, indexes with 3/4 code rates are stored in 9216 and 18431, and after decoding is completed, 9216 code words are rearranged according to the sequence of the index table storage unit.

13. The LDPC decoder of claim 2, wherein: the LMN register set comprises two groups of register arrays, wherein each group is a 27 x 4 register array; the ZMN register file also contains two sets of 30 x 4 register arrays each.

14. The LDPC decoder of claim 2, wherein: the LDPC decoder adopts VLSI hardware architecture.

15. The LDPC decoder of claim 2, wherein: the state control and address generation module controls the LDPC decoder to be scheduled among an idle state, an initialization state, a check node operation state and a variable node operation state.

16. An implementation method of an LDPC decoder comprises the following steps:

the variable node processing module updates all variable nodes to check nodes by using 9 variable node processing units, and simultaneously outputs a hard decision result obtained by the current variable node in the current iteration process to a decision codeword storage unit; wherein,

the 5 check node processing units work simultaneously, the operation of 18 check nodes is completed through 4 periods, and the updating from all the check nodes to the variable nodes is completed once through 256 × 4 periods;

the 9 variable node processing units complete the operation of 36 VNU nodes through 4 cycles, and complete the update of all variable nodes to check nodes once through 256 × 4 cycles.

17. The method of implementing an LDPC decoder as recited in claim 16, wherein: the check node module and the variable node processing module are designed by adopting a pipeline technology, and the processing unit is divided into multi-stage pipeline processing to update the check nodes and the variable nodes.

18. The method of implementing an LDPC decoder as recited in claim 16, wherein: and outputting the code word of the last decoding judgment while initializing the system.