CN112187289A

CN112187289A - LDPC decoder implementation method based on hardware implementation

Info

Publication number: CN112187289A
Application number: CN202011160601.0A
Authority: CN
Inventors: 左超; 张晓磊; 沈德同
Original assignee: Nanjing University Of Technology Intelligent Computing Imaging Research Institute Co ltd
Current assignee: Nanjing University Of Technology Intelligent Computing Imaging Research Institute Co ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-01-05

Abstract

The invention provides a method for realizing an LDPC decoder based on hardware, which is used for improving the speed and the universality of the LDPC decoder. The invention determines the number of parallel rows based on the size of the quasi-cyclic matrix expansion factor, and adopts pipeline operation to complete the processes of updating check nodes and variable nodes and judging decoding among all the rows, thereby ensuring that the time for iteration is minimum. The method for converting the updating process of the variable nodes into the difference value between the sum of the variable node values corresponding to the updating of the check nodes and the last check node is provided, the traversal process of the variable nodes is omitted, the iteration time is greatly shortened, and the storage space of the variable node updating value is saved. The method of combining the minimum sum of layers ensures that the iteration times are very few, and the decoding speed is greatly improved.

Description

LDPC decoder implementation method based on hardware implementation

Technical Field

The invention belongs to the field of communication, and particularly relates to an LDPC decoder implementation method based on hardware implementation.

Background

LDPC codes (Low Density Parity Check codes) were first proposed in his phd paper by Robert g. gallager in the 60 s of the 20 th century but, limited to the technological conditions at that time, lack of feasible decoding algorithms, and were largely ignored in the following 35 years, during which LDPC codes were generalized and presented by Tanner in 1981, and a graph representation of LDPC codes, i.e. the later named Tanner graph, was presented. The Turbo code was discovered by Berrou et al in 1993, and on the basis, MacKay and Neal et al studied the LDPC code again before and after 1995 and proposed a feasible decoding algorithm, thereby further discovering the good performance of the LDPC code and rapidly arousing strong reverberation and great attention. After research and development for over a decade, researchers have made breakthrough in all aspects, and the related technology of LDPC codes is becoming mature, and even commercialized application results have been started, and the technology has entered standards in related fields such as wireless communication.

The LDPC code is a linear code defined by a check matrix, and in order to make decoding feasible, the check matrix is required to meet sparsity when the code length is longer, namely the density of 1 in the check matrix is lower, namely the number of 1 in the check matrix is required to be far less than the number of 0, and the density is required to be lower as the code length is longer. In many error correction algorithms, the performance of the LDPC code can approach to the Shannon limit, error negotiation can be completed only by one-time interaction, and the decoding algorithm is introduced into the QKD error correction algorithm due to the characteristics of hardware realization and the like.

For the same LDPC code, different decoding algorithms can be adopted to obtain different error code performances. The excellent decoding algorithm can obtain good error code performance, otherwise, the common decoding algorithm is adopted, and the error code performance is general.

Decoding algorithms of LDPC codes include the following three broad categories: hard decision decoding, soft decision decoding and hybrid decoding.

1. The hard decision decoding firstly demodulates the received real number sequence through a demodulator, then carries out hard decision to obtain a hard decision 0,1 sequence, and finally transmits the obtained hard decision sequence to a hard decision decoder for decoding. The computational complexity of this method is inherently low, but the hard decision operation will lose most of the channel information, resulting in low channel information utilization, and the channel information utilization and decoding complexity of the hard decision decoding are the lowest of the three major decoding classes. Common hard decision decoding algorithms include a bit-flipping (BF) algorithm and a one-step majority-logic (OSMLG) decoding algorithm.

2. The soft-decision decoding can be regarded as infinite bit quantization decoding, which makes full use of the received channel information (soft information), so that the utilization rate of the channel information is greatly improved, and the channel information used by the soft-decision decoding not only includes symbols of the channel information, but also includes amplitude values of the channel information. The full utilization of the channel information greatly improves the decoding performance, so that the decoding can be carried out in an iterative manner, the received channel information is fully mined, and finally excellent error code performance is obtained. The channel information utilization rate and decoding complexity of soft-decision decoding are the highest of the three major types of decoding. The most commonly used soft-decision decoding algorithm is the sum-product decoding algorithm, also called Belief Propagation (BP) algorithm.

3. Compared with the hard decision decoding and the soft decision decoding, the hybrid decoding combines the characteristics of the soft decision decoding and the hard decision decoding, is a decoding algorithm based on the reliability, and utilizes partial channel information to calculate the reliability on the basis of the hard decision decoding. Common hybrid decoding algorithms include weighted bit-flipping (WBF) algorithm and weighted OSMLG (WMLG) decoding algorithm.

The above three major decoding algorithms: the hard decision decoding is simple to realize, the decoding speed is high, but the performance is poor; the soft decision decoding is relatively complex to realize, the decoding speed is low, but the performance is optimal; the hybrid decoding has a good balance among decoding performance, complexity and decoding speed, but the hybrid decoding has poor universality, and the decoding speed has a larger difference compared with hard decision decoding. Therefore, an LDPC decoder implementation method with strong versatility and faster speed is urgently needed.

Disclosure of Invention

In order to improve the speed and the universality of the LDPC decoder, the invention provides an LDPC decoder implementation method based on hardware implementation.

The technical scheme of the invention is as follows: an implementation method of an LDPC decoder based on hardware implementation, the hardware comprising: a set of ROMs for storing the position and size of non-zero elements in the matrix; the variable node RAMs are used for storing values of variable nodes, and the number of the variable node RAMs is equal to the size of the expansion factor; a set of check node RAMs for storing the values of the check nodes; a group of temporary buffer RAMs for storing temporary comparison values between each line, wherein the number of the temporary buffer RAMs is equal to the product of the size of the expansion factor and the maximum value of each line of non-zero elements of the matrix; a subtracter, a comparator and an adder which are arranged in sequence.

The LDPC decoder implementation method comprises the following steps:

step 1, selecting an initial value according to the estimated error rate, updating the initial value into a variable node, grouping by using an expansion factor, and storing the initial value into a variable node RAM in a pipeline mode;

step 2, determining the size of parallelism according to the size of the expansion factor, wherein each variable node RAM represents a column in the expansion factor, variable values of corresponding depths are taken out according to the positions of non-zero elements in the matrix, sorting is carried out according to the offset, and a difference value is made between the variable values and the values of corresponding check nodes; comparing the number of the comparison times with the value taken out from the position of the next non-zero element in the same row, wherein the comparison times are equal to the maximum value of the number of the non-zero elements in each row of the matrix; after the comparison is finished, searching and recording the minimum value and the secondary minimum value in each line, then storing the check node RAM at the corresponding position of the minimum value into the corresponding position of the secondary minimum value, and adopting a pipeline operation mode for dereferencing, comparing and storing;

step 3, storing the taken variable values in a variable node temporary cache RAM in the comparison process, summing the corresponding variable values and the updated values of the check nodes after the comparison is finished, and storing the summed values back to the variable node RAM to finish the updating of the variables for use when the next check node is updated or decoding judgment is carried out;

and 4, updating the value of the variable node after each base matrix row is updated, and directly performing decoding judgment on the value of the variable node after each base matrix row is iterated.

Preferably: the initialization formula of the step 1 is

Wherein, in the step (A),iin order to be a column,jin order to do so,lin order to be able to perform the number of iterations,Pis an initial value.

Preferably: the check node updating formula of the step 2 is

If the first iteration is the case, then,

wherein, in the step (A),iin order to be a column,jin order to do so,lis the number of iterations.

Preferably: the variable node updating formula of the step 3 is

Compared with the traditional method, the method has the following advantages: the method has strong universality and is suitable for decoding of any quasi-cyclic matrix; the method has the advantages that the speed is high, the hardware is realized, the variable node iteration process is omitted, the whole process is subjected to pipeline operation, and the iteration time is greatly saved; the hardware area is controllable, if the hardware resource is smaller, the decoder can be completely constructed by reducing the size of the expansion factor, and the use of the structure and the function is not influenced.

Drawings

FIG. 1 is a flow chart of a decoder decoding algorithm according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a decoder structure (structure of one of the parallel modules) according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Memory resource consumption, requiring a set of ROMs to store momentsThe position and the size of non-zero elements in the matrix, a group of variable node RAMs are used for storing values of variable nodes, the number of the variable node RAMs is equal to the size of the expansion factor, a group of check node RAMs are used for storing values of the check nodes, a group of temporary cache RAMs are used for storing temporary comparison values among lines, and the number of the temporary cache RAMs is equal to the product of the size of the expansion factor and the maximum value of the non-zero elements in each line of the matrix. In the formula, i is a column, j is a row,lfor the number of iterations, P is the initial value.

Step 1: and selecting an initial value according to the estimated error rate, updating the initial value into a variable node, and storing the initial value into a variable node RAM in a pipeline mode by taking the expansion factor as a packet. The initialization formula is as follows:

。

step 2: the parallelism is sized according to the size of the spreading factor, such as a matrix with an IEEE spreading factor of 96, a base matrix of 12X24 size, the variable nodes have a number of RAMs of 96 and a depth of 24, each RAM representing one column in the spreading factor, a total of 96 columns, taking out variable values of corresponding depths according to the positions of nonzero elements in the matrix, sequencing according to the offset, making a difference value with the value of the corresponding check node (if the difference value is the first iteration, subtracting 0 to complete the update of the variable node), comparing with the value extracted from the position of the next non-zero element in the same row, including symbol updating and conversion of the complement original code, wherein the comparison frequency is equal to the maximum value of the number of the non-zero elements in each row of the matrix, if the number of the non-zero elements in the row is less than the maximum value of the matrix, and adding the maximum value to the positions of the missing nonzero elements for comparison, and not influencing the final iteration result. And (3) performing parallel comparison on 96 rows, finding the minimum value and the secondary minimum value in each row after the comparison is finished, recording the position of the minimum value, storing the check node RAM at the corresponding position of the minimum value into the RAMs at the other positions of the secondary minimum value, and storing the check node RAM into the minimum value, wherein the dereferencing, the comparison and the storage all adopt a pipeline operation mode, so that the fastest speed is ensured.

The check node update is disclosed as follows:

，

if the iteration is the first iteration:

。

and step 3: and storing the taken variable values in the cached RAM in the comparison process, summing the variable values at the corresponding positions and the updated values of the check nodes after the comparison is finished, storing the sum back to the variable node RAM, finishing the updating of the variables, and using the sum when the check nodes are updated next time or decoding judgment.

The variable node update is disclosed as follows:

。

and 4, step 4: the number of times of one iteration execution is equal to the number of rows of the base matrix. And after each base matrix row is updated, the value of the variable node is updated, namely the layering minimum sum scheme, the layering minimum sum scheme has high convergence speed, the iteration times are greatly reduced, after each base matrix row is iterated, the decoding judgment can be directly carried out on the value of the variable node at the moment, and the data number of one judgment is determined according to the size of the expansion factor. If the spreading factor size is 96. Then, parallel judgment is carried out on every 96 columns (the value is determined to be 0 or 1 according to the positive and negative of the sign bit), decoding judgment can be carried out after each group of judgment, for a quantum key distribution system, the decoding judgment is that the variable and the matrix are multiplied, multiplication is carried out once for every 96 groups, because the matrix is a quasi-cyclic matrix, the multiplication process can be completed only by carrying out cyclic shift on the variable according to the offset, and the judgment process also adopt a pipeline scheme, so that the time required by judgment and judgment is greatly reduced.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An implementation method of an LDPC decoder based on hardware implementation, the hardware comprising:

a set of ROMs for storing the position and size of non-zero elements in the matrix;

the variable node RAMs are used for storing values of variable nodes, and the number of the variable node RAMs is equal to the size of the expansion factor;

a set of check node RAMs for storing the values of the check nodes;

a group of temporary buffer RAMs for storing temporary comparison values between each line, wherein the number of the temporary buffer RAMs is equal to the product of the size of the expansion factor and the maximum value of each line of non-zero elements of the matrix;

the subtracter, the comparator and the adder are sequentially arranged;

the LDPC decoder implementation method comprises the following steps:

2. The method for implementing the LDPC decoder according to claim 1, wherein the initialization formula of step 1 is

3. The method for implementing the LDPC decoder according to claim 1, wherein the check node update formula of step 2 is as follows

If the first iteration is the case, then,

4. The method for implementing the LDPC decoder according to claim 1, wherein the variable node update formula of step 3 is as follows

Wherein the content of the first and second substances,iin order to be a column,jto move，lIn order to be able to perform the number of iterations,Pis an initial value.