CN109921877B - LDPC code parallel decoding method based on CUDA architecture under AWGN channel - Google Patents

LDPC code parallel decoding method based on CUDA architecture under AWGN channel Download PDF

Info

Publication number
CN109921877B
CN109921877B CN201811596316.6A CN201811596316A CN109921877B CN 109921877 B CN109921877 B CN 109921877B CN 201811596316 A CN201811596316 A CN 201811596316A CN 109921877 B CN109921877 B CN 109921877B
Authority
CN
China
Prior art keywords
gpu
node
check
decoding
ldpc code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811596316.6A
Other languages
Chinese (zh)
Other versions
CN109921877A (en
Inventor
鲁邹晨
程诚
王涛
刘进
郭鲁皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 20 Research Institute
Original Assignee
CETC 20 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 20 Research Institute filed Critical CETC 20 Research Institute
Priority to CN201811596316.6A priority Critical patent/CN109921877B/en
Publication of CN109921877A publication Critical patent/CN109921877A/en
Application granted granted Critical
Publication of CN109921877B publication Critical patent/CN109921877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Error Detection And Correction (AREA)

Abstract

The invention provides an LDPC code parallel decoding method based on a CUDA framework under an AWGN channel, which fully utilizes the advantages of higher parallelism of a Graphics Processing Unit (GPU) and more operation processors, and in an iterative decoding algorithm of the LDPC code, different code words and nodes corresponding to check equations process information independently, so that the LDPC code parallel decoding method is suitable for being realized by a full parallel platform. The invention firstly directly converts the original CPU serial decoding by the decoding algorithm, and the decoding algorithm and the original CPU serial decoding are strictly consistent in algorithm principle, thereby not causing the loss of error code performance. Secondly, the invention utilizes the constant memory of the GPU to compress and store the check matrix H, thereby effectively reducing the time delay of data access. Finally, the designed parallel decoding scheme realizes simultaneous calling of a large number of parallel threads on the GPU, and performs cooperative processing on updating steps of CN and VN nodes with higher operation complexity in decoding, thereby effectively reducing decoding delay of the LDPC code.

Description

LDPC code parallel decoding method based on CUDA architecture under AWGN channel
Technical Field
The invention relates to the field of channel coding, in particular to a parallel decoding algorithm of a low density parity check code (LDPC code).
Background
The LDPC code is a linear block code with the characteristic of approaching to the Shannon limit, and is widely applied to a plurality of communication systems by virtue of the characteristic of excellent error correction capability and low hardware complexity. Under the application scene of low signal-to-noise ratio, the LDPC code can realize lower bit error rate and has no error floor when the signal-to-noise ratio is high. At present, the realization of efficient codecs of LDPC codes gradually becomes a hotspot of research work on channel coding, and with the rapid development of computer operation processing capability, researchers have higher and higher requirements on the data processing speed of codecs. The common LDPC code has a relatively simple coding circuit design, so the research center of the code mainly focuses on the optimization of decoding complexity and the improvement of throughput, and the traditional CPU platform often has the problem of large decoding delay when the traditional CPU platform completes the decoding of the LDPC code with large code length and more iteration times.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a parallel decoding scheme of an LDPC code, which fully utilizes the advantages of higher parallelism and more operation processors of a Graphics Processing Unit (GPU), and in an iterative decoding algorithm of the LDPC code, different code words and nodes corresponding to check equations process information independently, so that the parallel decoding scheme is suitable for being realized by a full parallel platform.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1, for each non-zero element in the LDPC code check matrix H with the size of m multiplied by n, finishing the initialization of a channel on a CPU, and Lqnm=LLRn=L(xn|yn)=log(P(xn=0|yn)/P(xn=1|yn))=2yn2Wherein, LqnmRepresenting variable node-specific information, LLRnA channel initial value representing a bit n;
step 2, allocating a memory on the GPU, storing the check matrix H into a constant memory of the GPU in a compact form, realizing the storage of matrix elements by only 4 bytes, wherein the first two bytes respectively represent a row mark and a column mark of the elements, the third byte represents a shift value of the matrix relative to a unit matrix, and the last byte represents whether the current element is 0; transmitting the LLR value of the channel initialization to the GPU;
step 3, calculating the information outside the check node for each CN node and VN node v belonging to V (m) on the GPU
Figure BDA0001921401800000021
Wherein a isv′m=sign(Lqv′m) V (m) represents the set of VN nodes participating in the mth check equation, v (m) represents the subset after element v is removed from the set;
after the functional details of the CNP core are defined on the GPU, the processing of the CN node is mapped into a CNP core, a thread Grid1 is distributed to the CNP core, each grade of GPU thread resources used by the thread Grid1 are declared by combining the code word parameters of the LDPC code, and the size of a thread Block is set as the row weight of the LDPC code check matrix; parallel processing is carried out between each row of the check matrix H, and independent calculation of messages is carried out in parallel between a plurality of variable nodes associated with each row;
step 4, respectively calculating each variable node v and m e to M (v) on the GPU
Figure BDA0001921401800000022
And
Figure BDA0001921401800000023
m (v) represents a set of CN nodes connected to the v-th VN node, m (v) \\ m represents a subset after element m is removed from the set;
after the function of a variable node processing Kernel function is defined on the GPU, mapping the processing part of the variable node to another Kernel function VNP core, and allocating a GPU Grid2 for acceleration; parallel processing is carried out among all columns of the check matrix H, and independent parallel operation is carried out among a plurality of check nodes related to each column;
step 5, if the iteration is finished, the message value LQnMore than or equal to 0 and more than or equal to zero, the judgment is made
Figure BDA0001921401800000027
Is 0, otherwise is 1; each time the decision of step 4 is performed, the decision codeword
Figure BDA0001921401800000024
If the syndrome satisfies the check constraint condition
Figure BDA0001921401800000025
The iteration terminates and will
Figure BDA0001921401800000026
Outputting as decoding, otherwise, turning to step 2; if the preset maximum iteration number is not successful, the failure is declared, and the decoding iteration is finished.
The invention has the beneficial effects that: firstly, the decoding algorithm is a direct conversion of the original CPU serial decoding, and the two algorithms are strictly consistent in principle, so that the loss of error code performance is not caused. Secondly, the invention utilizes the constant memory of the GPU to compress and store the check matrix H, thereby effectively reducing the time delay of data access. Finally, the designed parallel decoding scheme realizes simultaneous calling of a large number of parallel threads on the GPU, and performs cooperative processing on updating steps of CN and VN nodes with higher operation complexity in decoding, thereby effectively reducing decoding delay of the LDPC code.
Drawings
FIG. 1 is a diagram illustrating a thread block structure of a GPU.
FIG. 2 is a schematic diagram of a simulation system model for LDPC encoding and decoding.
FIG. 3 is a block diagram of an implementation of a CUDA-based LDPC parallel decoder.
Detailed Description
The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
Aiming at the problem of decoding delay possibly caused by the serial iterative decoding of the LDPC code of the traditional CPU platform, the invention provides a parallel decoding scheme of the LDPC code based on the CUDA architecture, which verifies that the CUDA parallel decoding is adopted, so that the decoding performance is not lost, the acceleration of the decoding process is realized, the decoding time delay is effectively reduced, and the system throughput is improved.
Under the CUDA architecture of the image processor, the invention distributes different tasks finished by the CPU and the GPU in a coding and decoding system based on the mixed programming model of the CPU and the GPU according to the characteristics of each subsystem of noise addition, modulation and coding and decoding, and provides a specific parallel decoder implementation scheme on the GPU.
The invention improves the processing time delay of a decoding part in the original LDPC code coding and decoding system, can finish algorithm acceleration by utilizing the parallel framework and constant memory characteristics of the GPU due to higher calculation complexity of the decoding part, has lower complexity of coding and other modules compared with a decoding module, does not need parallel processing, and only needs serial completion on the CPU.
Aiming at the normalization minimum sum decoding algorithm of the LDPC code, the steps of CN node processing, VN node processing and decoding result calculation completion check are mapped into a plurality of independent kernel functions, so that the steps are respectively or completely operated on a GPU, and parallel acceleration is completed by utilizing respective divided thread grids. The parallel decoding algorithm of the LDPC code mainly comprises the following steps:
the check matrix H of the LDPC code is known as an m x n matrix, a code word
Figure BDA0001921401800000031
For the transmitted codeword sequence, a sequence of symbols is formed by BPSK modulation
Figure BDA0001921401800000032
Wherein xi=1-2ciI is more than or equal to 1 and less than or equal to n. After the AWGN channel transmission, the symbol sequence of the code word received by the receiving end is
Figure BDA0001921401800000033
Wherein y isi=xi+niWherein n isiThe noise value of AWGN channel has mean value of 0 and variance of sigma2. According to
Figure BDA0001921401800000034
The code word sequence obtained by decoding is
Figure BDA0001921401800000035
Step 1, for each non-zero element in the H matrix of the LDPC code, channel initialization is completed on the CPU:
Lqnm=LLRn=L(xn|yn)=log(P(xn=0|yn)/P(xn=1|yn))=2yn2 (1)
Lqnmrepresenting variable node-specific information, LLRnA channel initial value (log likelihood ratio) representing bit n;
step 2, GPU initialization: and allocating a memory on the GPU, storing the check matrix H of the QC-LDPC code into a constant memory of the GPU in a compact form, and realizing the storage of matrix elements by only 4 bytes, wherein the first two bytes respectively represent a row mark and a column mark of the elements, the third byte represents a shift value of the matrix relative to a unit matrix, and the last byte represents whether the current element is 0 or not. Then transmitting the LLR value of the channel initialization to the GPU;
step 3, the check node update is completed on the GPU: and for each CN node and VN node v ∈ V (m), calculating external information:
Figure BDA0001921401800000041
wherein a isv′m=sign(Lqv′m),
Figure BDA0001921401800000042
Represents check node extrinsic information, v (m) represents the set of VN nodes participating in the mth check equation, and v (m) \\ v represents the subset after element v has been removed from the set.
After the functional details of the CNP core are defined on the GPU, the above processing on the CN node is mapped to a CUDA Kernel (CNP core), and a thread Grid1 is allocated to the CNP core, and the code word parameters of the LDPC code are combined to declare each stage of GPU thread resources used by the thread Grid1, where the size of the thread Block is set as the row weight of the LDPC code check matrix. Accelerating by utilizing the multithreading characteristic of the GPU; namely, the parallel processing is carried out between each row of the check matrix H, and the independent calculation of the message is carried out between a plurality of variable nodes associated with each row in parallel.
Step 4, variable node update is completed on the GPU:
for each variable node v and m ∈ M (v), respectively calculating:
Figure BDA0001921401800000043
Figure BDA0001921401800000044
m (v) represents a set of CN nodes connected to the v-th VN node, and m (v) \\ m represents a subset after element m is removed from the set.
After the function of a variable node processing Kernel function is defined on the GPU, the processing part of the variable node is mapped into another Kernel function VNP core, and a GPU Grid2 is distributed for acceleration; namely, the columns of the check matrix H are processed in parallel, and a plurality of check nodes associated with each column are independently operated in parallel.
Try decoding decision [ step 5 ]:
if the message value LQ is equal to the value of LQ after the iteration is finishednMore than or equal to 0 and more than or equal to zero, the judgment is made
Figure BDA0001921401800000045
Is 0, otherwise is 1. Each time the decision of step 4 is performed, the decision codeword
Figure BDA0001921401800000046
If the syndrome satisfies the check constraint condition:
Figure BDA0001921401800000051
the iteration terminates and will
Figure BDA0001921401800000052
And as decoding output, otherwise, turning to the second step to execute in sequence. If the preset maximum iteration number is not successful, the failure is declared, and the decoding iteration is finished.
The complete LDPC code coding and decoding system is constructed in a mode of joint work of a CPU platform and a GPU platform, and the GPU and the CPU are respectively compiled by using a CUDA6.0 toolkit and Microsoft Visual Stdio 2010. The code word adopts (1024, 512) LDPC code with code rate of 0.5 described in CCSDS standard, the maximum decoding iteration times is set as 64 times, and the correction factor k of the normalization minimum sum decoding algorithm is 0.75. Wherein:
on a CPU platform, generating an original information bit signal u, adopting a (1024, 512) LDPC code with a code rate of 0.5 described in a CCSDS standard, forming a code bit c belonging to {0, 1} after coding, obtaining a channel sending bit x belonging to { -1, 1} by applying BPSK modulation (x ═ 2c-1), then transmitting through an AWGN channel, adding white Gaussian noise, and obtaining a signal y belonging to x + n after noise interference at a channel receiving position. And transmitting the LLR value initialized by the channel to the GPU by utilizing a cudaMemcpy () function, and entering a decoder on the GPU platform for iterative decoding.
On a GPU platform, after sequentially finishing GPU initialization, CNP core processing and VNP core processing, decoding code words
Figure BDA0001921401800000053
And (4) returning to the CPU:
first, initializing a GPU: and calling a cudaMalloc () function to open up a memory space for the check matrix H, and storing the check matrix into the GPU, so that all the information to be accessed and updated by the threads is stored into a memory space of the GPU. The H matrix of the quasi-cyclic structure LDPC code is highly structured and can be divided into a plurality of sub-matrixes with the size of Z, and the sub-matrixes comprise three types: the all-zero matrix, the unit matrix and the shift matrix of the unit matrix, so that the 4 bytes can complete the storage of the matrix elements. The elements of the LDPC code check matrix belong to constants which are read only and not written in the system implementation, and are stored in a constant memory of the GPU, and the access time delay of the system can be optimized by reading data by utilizing the rapid data broadcasting characteristic of the LDPC code check matrix.
Then, using a cudaMemcpy () function to transfer the LLR value of the channel initialization to the GPU, wherein the instruction for transferring parameters in the CUDA program is:
cudaMemcpy(dev_lratio,lratio,sizeof(double),cudaMemcpyHostToDevice);
the four parameters sequentially indicate that the destination end of the transfer is a device (devs, namely GPU), the source end of the transfer is a host (namely CPU), the data type of the LLR value is double-type data, and the last function indicates that the transfer direction is from the host to the device, namely from the CPU to the GPU. When the decoded result is returned, the parameter transfer from the device GPU to the host CPU is also accomplished by means of the function cudamecpy ().
The second step is that: a CNP nucleus. In each iteration process, updating operation is carried out on CN nodes according to VN nodes associated with each row of the LDPC code check matrix H, each GPU thread simultaneously selects associated variable node messages to carry out independent calculation, and values are transmitted to check nodes. For a (1024, 512) LDPC code with a code rate of 0.5, whose rows of the check matrix H repeat 6, then each check node will be connected to 6 variable nodes. The main task of the CNP core of the GPU is to be able to calculate the information that each check node returns to the other associated six variable nodes simultaneously.
Each CN node calls one thread block to perform an operation, and the inside of the thread block includes 6 threads. Before using the thread Grid and the thread Block of the GPU, the execution configuration of the kernel function should be performed in the following manner, that is, the sizes of the Grid and the thread Block are declared:
dim3 dimBlock(x1,y1,1);
the size of the Block declared at this time is x1 × y1, and after the sizes of the thread Block and the grid are declared, the functional details of the check node processing Kernel function CNP are defined, and then the Kernel function is executed. The instructions for the GPU to execute the kernel function are:
CNP<<<dimGrid1,dimBlock>>>(,,,);
the first parameter in the tip bracket is the number of blocks in Grid, the second parameter is the number of threads in Block, and the inside contains the function parameters of the CNP core. Similar to the kernel function VNP for processing variable nodes, the decision of the VN node can also be performed on the GPU in a parallel manner.
The third step: a VNP core. And after the CN node is updated, updating the VN node according to the CN node associated with each column of the LDPC code check matrix H, and independently calculating by simultaneously selecting associated check node messages by each GPU thread and transmitting the values to the variable nodes.
And after the iteration is finished, judging the VN node on the GPU, and performing judgment operation on each bit by each thread. When the message is transmitted in the decoding process, a global memory joint access mode of the GPU is adopted, and higher access bandwidth can be obtained.
The fourth step: and returning the bit. And copying the decoded and judged bits from the GPU back to the CPU, and calculating the error rate. And free all space that was opened up on the GPU.
On CPU platform handle
Figure BDA0001921401800000061
And comparing the number of the error bits with the total number of the transmission bits, counting the bit error rate, and obtaining the decoding performance of the parallel decoding algorithm.
When the decoding time consumption is analyzed and compared, the CPU platform uses a clock () function to count the decoding time consumption, which can reach the order of ms. And the GPU platform counts the time consumed by decoding and adopts a CUDA profiler timing function in a CUDA toolkit.
And (3) testing results:
the method comprises the following steps of (I) correctness verification:
when the variation range of the signal-to-noise ratio is 1dB-2.6dB, the BER performance curves of the adopted CUDA parallel decoding algorithm and the CPU serial decoding algorithm are basically coincident and are obviously reduced along with the increase of the signal-to-noise ratio, and the error code rate reaches 6 multiplied by 10 when the signal-to-noise ratio is 2.6dB-6The parallel decoding algorithm provided by the invention is proved not to cause error code performance loss.
(II) validity verification:
1. a check matrix storage mode: the total frame number of tested (1024, 512) LDPC codes is 10000 frames, only one code word is input each time during decoding, the number of GPU distributed bus threads is 256, and the check matrix storage mode respectively adopts a CPU and a constant memory of a GPU for storage. And carrying out multiple statistical averaging on the decoding time delays of the two schemes, wherein the result shows that the decoding time delay corresponding to the scheme of which the H is stored in the constant memory of the GPU is 71 percent of the decoding time delay of the scheme stored in the memory of the CPU. By utilizing the characteristic of quick data broadcasting of the GPU constant memory, unnecessary data reading time delay between a host and equipment is saved, so that the decoding throughput is improved by 1.3-1.6 times, and the method is shown in the following table:
Figure BDA0001921401800000071
2. the parallelism of the algorithm is as follows: the total frame number of the tested (1024, 512) LDPC code is 10000 frames, and the SNR is fixed to be 3.0 dB. In the experiment, the number of bus threads distributed by the GPU is 256, and four different parallelism decoding schemes are respectively applied to carry out time delay test: scheme 1 is traditional CPU decoding, scheme 2 only performs parallel acceleration on CN node processing on a GPU, scheme 3 only performs parallel acceleration on VN node processing on the GPU, and scheme 4 puts both processing on the GPU for parallel processing. The test result shows that the schemes 2, 3 and 4 have obvious acceleration effect compared with the original CPU decoding, wherein the acceleration ratio obtained by only carrying out parallel processing on the CN node or the VN node is about 2.5 times; the acceleration ratio achieved in a fully parallel manner is 6.1 times, as shown in the following table:
Figure BDA0001921401800000072
3. number of threads of invoked GPU: the total frame number is 10000 frames, the given signal-to-noise ratio is 3.0dB, only one code word is fed each time in the decoding process, and then different thread numbers are distributed to the code word for decoding. The results show that: when a small number of GPU threads (32 threads) are adopted, the parallel decoding time of the GPU platform is slightly longer than that of the original serial decoding of the CPU platform, because the acceleration effect of the small number of threads on the decoding time delay is very limited, and the time delay of data transmission between a host and equipment cannot be sufficiently offset. With the increasing number of GPU threads used for parallel decoding, the high-speed decoding characteristics of parallel decoding implemented based on the CUDA platform are gradually developed. Within a certain thread number range, the multiple of the acceleration of the decoding speed and the number of GPU threads used for parallel decoding are approximately in linear growth relationship, and the following table shows that:
Figure BDA0001921401800000081
the test effect shows that: the decoding algorithm of the invention can improve the decoding speed of the LDPC code and effectively improve the system throughput.

Claims (1)

1. An LDPC code parallel decoding method based on CUDA architecture under AWGN channel is characterized by comprising the following steps:
step 1, for each non-zero element in the LDPC code check matrix H with the size of m multiplied by n, finishing the initialization of a channel on a CPU, and Lqnm=LLRn=L(xn|yn)=log(P(xn=0|yn)/P(xn=1|yn))=2yn2Wherein, LqnmRepresenting variable node-specific information, LLRnIndicating the channel initial value, x, of bit nnFor transmitting the n-th coded symbol, y, of a codeword after BPSK modulationnFor receiving the nth bit, sigma, of a sequence of code words after channel transmission2Variance of AWGN channel noise;
step 2, allocating a memory on the GPU, storing the check matrix H into a constant memory of the GPU in a compact form, realizing the storage of matrix elements by only 4 bytes, wherein the first two bytes respectively represent a row mark and a column mark of the elements, the third byte represents a shift value of the matrix relative to a unit matrix, and the last byte represents whether the current element is 0; transmitting the LLR value of the channel initialization to the GPU;
step 3, calculating the information outside the check node for each CN node and VN node v belonging to V (m) on the GPU
Figure FDA0003212239290000011
Wherein a isv'm=sign(Lqv'm) V (m) represents a VN node set participating in an mth check equation, the mth check equation refers to a check equation corresponding to an mth row of the LDPC code check matrix, and v (m) \ v represents a subset after an element v is removed from the set; lq (low-Q)v'mThe information which is transmitted to the check node m at the solution variable node v in the check node updating process for all the check nodes m and any variable node v epsilon V (m) adjacent to the check nodes m
Figure FDA0003212239290000012
In this case, it is necessary to calculate likelihood ratio information Lq transferred from all variable nodes v' e v (m) \ v connected to the check node m, except the variable node vv'mCalculating Lq by comparisonv'mIs calculated from the minimum and sign bit of
Figure FDA0003212239290000013
After the functional details of the CNP core are defined on the GPU, the processing of the CN node is mapped into the CNP core, a thread Grid1 is distributed to the CNP core, each grade of GPU thread resources used by the thread Grid1 are declared by combining the code word parameters of the LDPC code, and the size of a thread Block is set as the row weight of the LDPC code check matrix; parallel processing is carried out between each row of the check matrix H, and independent calculation of messages is carried out in parallel between a plurality of variable nodes associated with each row;
step 4, respectively calculating each variable node v and m e to M (v) on the GPU
Figure FDA0003212239290000014
And
Figure FDA0003212239290000015
m (v) represents a set of CN nodes connected to the v-th VN node, M (v) \ m represents a subset after element m is removed from the set, and LLRvIndicating reception of value y according to the channelvA calculated log-likelihood ratio; in the variable node updating process, for all variable nodes v and any check node m e M (v) adjacent to the variable nodes v, a message Lq transmitted to the variable nodes v by the check nodes m is solvedvmIn the process, except the check node m, the likelihood ratio information transmitted from all other check nodes m' connected with the variable node v to the variable node v by the m (v) \\ m needs to be calculated; for variable node v, solving the total log-likelihood ratio LQvWhen making a decision, it is necessary to calculate the likelihood ratio information Lr transmitted to the variable node v by any check node m e m (v) adjacent to the variable nodemv
After the function of a variable node processing Kernel function is defined on the GPU, mapping the processing part of the variable node to another Kernel function VNP core, and allocating a GPU Grid2 for acceleration; parallel processing is carried out among all columns of the check matrix H, and independent parallel operation is carried out among a plurality of check nodes related to each column;
step 5, if the iteration is finished, the message value LQnMore than or equal to 0 and more than or equal to zero, the judgment is made
Figure FDA0003212239290000024
Is 0, otherwise is 1; each time the decision of step 4 is performed, the decision codeword
Figure FDA0003212239290000021
If the syndrome satisfies the check constraint condition
Figure FDA0003212239290000022
The iteration terminates and will
Figure FDA0003212239290000023
Outputting as decoding, otherwise, turning to step 2; if the preset maximum iteration number is not successful, the failure is declared, and the decoding iteration is finished.
CN201811596316.6A 2018-12-26 2018-12-26 LDPC code parallel decoding method based on CUDA architecture under AWGN channel Active CN109921877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811596316.6A CN109921877B (en) 2018-12-26 2018-12-26 LDPC code parallel decoding method based on CUDA architecture under AWGN channel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811596316.6A CN109921877B (en) 2018-12-26 2018-12-26 LDPC code parallel decoding method based on CUDA architecture under AWGN channel

Publications (2)

Publication Number Publication Date
CN109921877A CN109921877A (en) 2019-06-21
CN109921877B true CN109921877B (en) 2021-11-12

Family

ID=66959907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811596316.6A Active CN109921877B (en) 2018-12-26 2018-12-26 LDPC code parallel decoding method based on CUDA architecture under AWGN channel

Country Status (1)

Country Link
CN (1) CN109921877B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111817728B (en) * 2020-08-03 2022-03-01 华中科技大学 Simulation system for realizing LDPC coding and decoding based on hardware and working method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework
CN103684474A (en) * 2012-08-31 2014-03-26 中国科学院上海高等研究院 Realization method of high-speed low density parity code (LDPC) decoder
CN104124980A (en) * 2014-07-16 2014-10-29 上海交通大学 High-speed secret negotiation method suitable for continuous variable quantum key distribution
CN106992856A (en) * 2017-03-29 2017-07-28 山西大学 The data coordinating method of extensive continuous variable quantum key distribution based on GPU

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9312929B2 (en) * 2004-04-02 2016-04-12 Rearden, Llc System and methods to compensate for Doppler effects in multi-user (MU) multiple antenna systems (MAS)
US10425134B2 (en) * 2004-04-02 2019-09-24 Rearden, Llc System and methods for planned evolution and obsolescence of multiuser spectrum

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103684474A (en) * 2012-08-31 2014-03-26 中国科学院上海高等研究院 Realization method of high-speed low density parity code (LDPC) decoder
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework
CN104124980A (en) * 2014-07-16 2014-10-29 上海交通大学 High-speed secret negotiation method suitable for continuous variable quantum key distribution
CN106992856A (en) * 2017-03-29 2017-07-28 山西大学 The data coordinating method of extensive continuous variable quantum key distribution based on GPU

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"规则LDPC码在GPU上的加速译码";任计林;《西安电子科技大学学报》;20161204;全文 *

Also Published As

Publication number Publication date
CN109921877A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
CN102412847B (en) Method and apparatus for decoding low density parity check code using united node processing
KR100846869B1 (en) Apparatus for Decoding LDPC with Low Computational Complexity Algorithms and Method Thereof
Ferraz et al. A survey on high-throughput non-binary LDPC decoders: ASIC, FPGA, and GPU architectures
US8429512B2 (en) Reduced complexity LDPC decoder
CN103208995B (en) A kind of premature termination method of low density parity check code decoding
US10848182B2 (en) Iterative decoding with early termination criterion that permits errors in redundancy part
CN106452455B (en) Dynamic decoding method based on OpenCL mobile device QC-LDPC
CN107968657B (en) Hybrid decoding method suitable for low-density parity check code
CN108462496B (en) LDPC decoder based on random bit stream updating
CN108092673B (en) BP iterative decoding method and system based on dynamic scheduling
CN112134570A (en) Multi-mode LDPC decoder applied to deep space communication
US10892783B2 (en) Apparatus and method for decoding polar codes
CN109586733A (en) A kind of LDPC-BCH interpretation method based on graphics processor
CN115037310B (en) 5G LDPC decoder performance optimization method and architecture based on random computation
CN109921877B (en) LDPC code parallel decoding method based on CUDA architecture under AWGN channel
CN101136639B (en) Systems and methods for reduced complexity ldpc decoding
WO2017045142A1 (en) Decoding method and decoding device for ldpc truncated code
US20160049962A1 (en) Method and apparatus of ldpc encoder in 10gbase-t system
CN108933604B (en) Variable node processing method and device
US10727869B1 (en) Efficient method for packing low-density parity-check (LDPC) decode operations
CN113381769B (en) Decoder based on FPGA
CN114584151B (en) Decoding method of analog decoding circuit stopping criterion based on probability calculation
CN111835363B (en) LDPC code decoding method based on alternate direction multiplier method
CN110708077B (en) LDPC code large number logic decoding method, device and decoder
CN113271109A (en) Iterative cycle data storage method and system in LDPC decoding process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant