CN106911336B - High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof - Google Patents

High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof Download PDF

Info

Publication number
CN106911336B
CN106911336B CN201710031380.9A CN201710031380A CN106911336B CN 106911336 B CN106911336 B CN 106911336B CN 201710031380 A CN201710031380 A CN 201710031380A CN 106911336 B CN106911336 B CN 106911336B
Authority
CN
China
Prior art keywords
decoding
core
module
decoded
cores
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710031380.9A
Other languages
Chinese (zh)
Other versions
CN106911336A (en
Inventor
殷柳国
张远东
葛广君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201710031380.9A priority Critical patent/CN106911336B/en
Publication of CN106911336A publication Critical patent/CN106911336A/en
Application granted granted Critical
Publication of CN106911336B publication Critical patent/CN106911336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/11Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
    • H03M13/1102Codes on graphs and decoding on graphs, e.g. low-density parity check [LDPC] codes
    • H03M13/1105Decoding
    • H03M13/1128Judging correct decoding and iterative stopping criteria other than syndrome check and upper limit for decoding iterations

Abstract

The invention relates to a high-speed parallel low-density parity check decoder for multi-core scheduling and a decoding method thereof, belonging to the technical field of wireless communication. The decoder comprises a data cache module, a multi-core scheduling module and an LDPC parallel decoding core consisting of a plurality of decoding cores which are sequentially connected; according to the method, a multi-core scheduling module allocates words to be decoded with a single codeword length to decoding cores in an idle state after decoding according to the working condition of rear-end parallel decoding cores, the multi-core scheduling module checks whether the decoding cores are decoded or not, if the decoding cores are verified according to decoding results, the decoding results are output to the multi-core scheduling module, the multi-core scheduling module uniformly outputs the decoding results of the decoding cores to a data cache module according to a codeword allocation sequence and the same sequence after decoding is finished, and the data cache module outputs decoded data. The invention effectively improves the operation efficiency and the decoding speed of the decoder by adding the novel multi-core scheduling module.

Description

High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof
Technical Field
The invention belongs to the technical field of wireless communication, and relates to a high-speed parallel low-density parity check decoder for multi-core scheduling and a decoding method thereof.
Background
With the rapid development of economy, no matter in the civil field or the military field, higher requirements are put forward on high resolution and high precision, and the mature and stable high-resolution (high-resolution) system is an important support for national military safety and various civil fields. In the high-resolution field, the high-resolution observation satellite is firstly developed in the united states, so that the technology is in the leading position in the world, and the countries with the leading high-resolution technology are not willing to share the resources due to the limited space resources and the unusual importance of data in the big data era, so that the autonomous high-resolution research and development are not slow.
With the gradual advance of high-resolution specialization (i.e. important specialization of a high-resolution earth observation system), the number of various spacecraft detectors and the like is rapidly increased, the data rate of detection information is higher and higher, and the generated data volume is huge at first. At present, a 300Mbps satellite high-speed data transmission system is widely applied, the transmission rate is required to reach more than 3Gbps, and the transmission rate is continuously improved to 10Gbps or even 30Gbps in the future. Therefore, a transmission system capable of meeting the requirement of high-speed data transmission is an important support for high-grade special development. However, there is a direct proportional relationship between the data transmission rate and the system power, the increase of the rate inevitably leads to the increase of the system power, and the system resources themselves are strictly limited, so how to process the data with higher rate under the limited resources becomes an important problem of the high-speed data transmission system.
Low Density Parity Check (LDPC) codes are gaining increasing attention due to their codeword construction approach that is very close to the theoretical limit coding gain. The decoding complexity is low, parallel decoding can be performed, whether the decoding is wrong or not is convenient to detect, the requirements of high-speed data transmission and high coding gain can be met, and the method is more and more widely applied to a high-speed satellite data transmission system at present. Generally, a single decoding core has limited decoding capability, and if high-speed decoding matched with high-speed data transmission is required to be realized, a plurality of decoding cores are usually required to work in parallel. As shown in fig. 1, the conventional high-speed LDPC decoder includes a data cache module, a multi-core scheduling module, and an LDPC parallel decoding core composed of a plurality of decoding cores, which are connected in sequence; each decoding core comprises an input data cache module, a soft information storage module, a variable node array module, a control module, a check node array module, a variable node storage module and a check node storage module which are sequentially connected, a decoding result storage module and an output cache module which are sequentially connected with the variable node array module, wherein the check node array module, the variable node storage module and the check node storage module are respectively connected with the control module. Each module is realized by one FPGA chip.
The decoding method of the traditional high-speed LDPC decoder comprises the following steps: the data to be decoded firstly enters a data cache module (formed by FIFO in an FPGA chip), and the data are parallelly sent into each LDPC parallel decoding core for decoding according to the instruction sent by the multi-core scheduling module. The parallel decoding cores are arranged in a dotted frame, each decoding core has the same working mode, namely a single code word firstly enters an input data cache module for caching, the function of the parallel decoding cores is to provide single code word information with the same frequency as a decoding core working clock for a rear-stage module, then the information to be decoded enters a soft information storage module, the soft information is read out from the storage module according to the fixed iteration times (marked as N times) and the decoding period set by a control module and circularly passes through a variable node array module and a check node array module, the soft information is output to a decoding result storage module and an output cache module after the iteration times are reached, and the output cache module is used for outputting the decoding result to the rear-stage module at a certain clock duty ratio according to the requirement. And the multi-core scheduling module splices the data according to the serial number of each decoding core and then directly outputs the spliced data through the data caching module. It can be seen that the traditional high-speed LDPC decoder is usually implemented by a multi-core parallel method, the method simply copies a single-core decoder, and the front end is connected with a traditional multi-core scheduling module and a data cache module to control the entire system.
However, the correct decoding of the conventional high-speed LDPC decoder is not only related to the normal operation of each parallel decoder, but also more critical to the scheduling function of the multi-core scheduling module. The conventional multi-core scheduling module is mainly responsible for sequential allocation of code words, that is, after information to be decoded is cached in the data cache module and reaches the storage depth of one code word length, the multi-core scheduling module reads out the code words to be decoded from the data cache module in sequence according to the specified single code word length, and assigns the code words to each decoding core in sequence according to the numbering sequence of the decoding cores. Because the decoding iteration times N set by the control module in the parallel decoding cores are fixed, the multi-core scheduling module can obtain the decoded data of each decoding core only after waiting for a fixed time, and then the data is spliced and output according to the serial number of the decoding core. And then the multi-core scheduling module continuously distributes the words to be decoded to each decoding core according to the steps of reading, distributing and splicing, outputs decoded data after waiting, and continuously repeats the process until all the words to be decoded are decoded or stop signals are received. Because the function of the multi-core scheduling module in the traditional high-speed LDPC decoder is simple and the time sequence is fixed, the multi-core scheduling module can be directly completed by using a counter.
The workflow of the conventional multi-core scheduling module is shown in fig. 2, which takes a four-core parallel decoder as an example. After the fixed maximum iteration number is set, the multi-core scheduling module starts to work after the data input into the data cache module reaches one code word. The data input into the data cache module is read by taking the length of one code word as a unit, and is respectively sent to a decoding core 1, a decoding core 2, a decoding core 3 and a decoding core 4 according to the sequence, each decoding core works all the time, the multi-core scheduling module waits during the decoding period until all the decoding cores reach the maximum iteration number, and the decoded data is recombined and output according to the sequence of the decoding core 1, the decoding core 2, the decoding core 3 and the decoding core 4. And the multi-core scheduling module continues to execute the allocation work until the system stops. Because the four decoding cores stop decoding after the maximum iteration number is reached, the working progress of the four decoding cores is kept consistent, and the decoding work of only four code words can be completed in each decoding period.
The method is low in implementation difficulty, but in the process, only a single decoding core is simply copied in the traditional parallel decoding, the words to be decoded are sequentially sent to each decoding core for decoding, and because each decoding core is set with a fixed iteration number in advance according to the simulation condition under the normal condition to ensure the error rate under the specified requirement, each decoding core also keeps a synchronous working state, and the code word allocation is easy to realize. However, in the actual decoding process, each decoding core needs to execute a fixed iteration number when decoding, and not every codeword needs to be decoded successfully by the maximum iteration number, and most codewords are decoded when the iteration number is less than the iteration number.
These are very critical issues for high speed decoders.
Therefore, the traditional parallel decoder has low overall decoding efficiency due to the fixed iteration times, and causes the waste of hardware resources under the condition of limited resources. At present, the iteration times can be controlled according to the actual decoding completion condition, and a novel multi-core scheduling module of the whole decoder can be scheduled integrally, so that limited resources are utilized more fully, and the integral decoding efficiency is improved.
Disclosure of Invention
The invention aims to solve the problem that hardware resources are not fully utilized in a traditional high-speed LDPC decoder with limited resources, and provides a novel multi-core scheduling high-speed parallel low-density parity check (LDPC) decoder and a decoding method thereof.
The decoder is realized by adopting an FPGA chip and comprises a data cache module, a multi-core scheduling module and an LDPC parallel decoding core consisting of a plurality of decoding cores, wherein the data cache module and the multi-core scheduling module are sequentially connected; the method is characterized in that: the data cache module is composed of an FIFO (first in first out) inside the FPGA, and the FIFO has deeper storage depth so as to ensure that enough cache space can be provided when the next code word arrives; the multi-core scheduling module is used for controlling the scheduling of the whole high-speed LDPC decoder, when the data amount of more than one code word is stored in the upper-stage data cache module, a decoding starting signal is sent to the later-stage parallel decoding core, the cache data is allocated to each decoding core for decoding, a decoding ending signal fed back by the later-stage parallel decoding core is received at the same time, whether each decoding core is in an idle state or not is checked, and the next word to be decoded in the upper-stage data cache module is sent to the decoding core in the idle state for decoding; after the decoding is finished, the decoding results of all decoding cores are uniformly output to a data cache module according to the code word distribution sequence and the same sequence; each decoding core in the LDPC parallel decoding cores comprises a soft information storage module, a variable node array module, a control module, a check node array module, a variable node storage module, a check node storage module and a decoding result storage module, wherein the soft information storage module, the variable node array module and the control module are sequentially connected with one another; the variable node array module is also connected with the check node storage module and the variable node storage module.
The invention provides a decoding method of the high-speed parallel low-density parity check decoder with the multi-core scheduling, which is characterized in that:
the word stream to be decoded firstly enters a data cache module, a multi-core scheduling module allocates words to be decoded with a single code word length to decoding cores which are in an idle state and complete decoding according to the working condition of rear-end parallel decoding cores, each decoding core firstly puts the words to be decoded into a soft information storage module after receiving the words to be decoded, decoding is carried out circularly through a variable node array module and a check node array module, each decoding result is stored in a decoding result storage module, the multi-core scheduling module checks whether the decoding core is completely decoded or not, if the decoding result is verified, the decoding result is output to the multi-core scheduling module, and the decoding cores are converted into the idle state; otherwise, the decoding core continues to carry out iterative decoding until reaching the set maximum iteration number, if the decoding is not correctly decoded at the moment, the decoding core is forcibly stopped to decode the current code word, the decoding result is output, and decoding failure information is fed back; the multi-core scheduling module sends the next word to be decoded to an idle decoding core for decoding; and after the decoding is finished, the multi-core scheduling module uniformly outputs the decoding results of the decoding cores to the data caching module according to the code word distribution sequence and the same sequence, and the data caching module outputs the decoded data.
The invention has the technical characteristics and beneficial effects that:
1) the data cache module of the invention increases the storage depth to ensure that enough cache space can be available when the next code word arrives, and a new code word is assigned to a decoding core after the decoding of the last code word is finished by the decoding core;
2) the multi-core scheduling module controls the scheduling of the whole high-speed LDPC decoder, and controls the decoding core to carry out non-fixed iteration times through the multi-core scheduling module, so that the decoding efficiency is effectively improved.
3) Because the front end of the multi-core scheduling module is provided with the data caching module for caching data to be decoded and the code word scheduling and control of the multi-core scheduling module exist, the input FIFO and the output FIFO are omitted in a single decoding core, so that the hardware resource is saved.
The invention greatly reduces the calculation amount of the traditional method through very small hardware overhead, obviously improves the overall efficiency of the system and saves the overall hardware resources.
Drawings
FIG. 1 is a block diagram of a conventional multi-core parallel LDPC decoder architecture.
FIG. 2 is a diagram of a conventional multi-core scheduling module.
FIG. 3 is a block diagram of the high-speed LDPC decoder with multi-core scheduling according to the present invention.
Fig. 4 is a maximum iteration number distribution according to an embodiment of the present invention.
Fig. 5 shows the relationship between the number of iterations and the decoding performance according to an embodiment of the present invention.
FIG. 6 is a graph showing the relationship between the number of iterations and the amount of computation saved according to the embodiment of the present invention.
FIG. 7 is an input FIFO depth increment of an embodiment of the present invention.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
The high-speed LDPC decoder with multi-core scheduling provided by the invention comprises a data cache module, a multi-core scheduling module and an LDPC parallel decoding core consisting of a plurality of decoding cores which are sequentially connected as shown in FIG. 2, and can be realized by adopting a FPGA chip as the traditional LDPC decoder; the high-speed LDPC decoder with multi-core scheduling is characterized in that:
1) the data cache module is composed of an FIFO inside an FPGA, and the FIFO has deeper storage depth than the FIFO in the data cache module of the traditional LDPC decoder so as to cache more code word sequence information and ensure that enough cache space can be available when the next code word arrives, and after a certain decoding core completes the decoding of the previous code word, a new code word is assigned to the decoding core;
2) the multi-core scheduling module is used for controlling the scheduling of the whole high-speed LDPC decoder, when the data amount of more than one code word is stored in the upper-level data cache module, a decoding starting signal is sent to the later-level parallel decoding cores, cache data are allocated to each decoding core for decoding, a decoding ending signal fed back by the later-level parallel decoding cores is received at the same time, whether each decoding core is in an idle state or not is checked, and the next word to be decoded in the previous-level data cache module is sent to the decoding core in the idle state for decoding; after the decoding is finished, the decoding results of all decoding cores are uniformly output to a data cache module according to the code word distribution sequence and the same sequence;
3) each decoding core in the LDPC parallel decoding cores consists of a soft information storage module, a variable node array module, a control module, a check node array module, a variable node storage module, a check node storage module and a decoding result storage module, wherein the soft information storage module, the variable node array module and the control module are sequentially connected with one another; the variable node array module is also connected with the check node storage module and the variable node storage module. The control module does not set fixed iteration times for all parallel decoding cores any more, and only sets the maximum iteration times (the maximum iteration times are set to be slightly larger than the fixed iteration times N set by the traditional control module). And controlling the decoding period of each iteration, namely sequentially substituting the soft information into the variable node array module and the check node array module according to the sequence to carry out operation. Because the data cache module is used for caching data to be decoded, the storage depth is increased, and the code word scheduling and control of the multi-core scheduling module exist at the same time, the LDPC parallel decoding core omits an input data cache module used for providing single code word information with the same frequency as a working clock of the decoding core for a later module and an output data cache module used for outputting a decoding result to the later module at a certain clock duty ratio in each decoding core of the traditional LDPC decoder, so as to save hardware resources.
The depth increment of the FIFO (which is deeper compared with the FIFO in the data cache module of the traditional LDPC decoder) is set by the invention and specifically explained as follows: if M code words are required to be decoded in total, wherein a code word is decoded correctly after iteration is performed for N times, b code word is decoded correctly after iteration is performed for N +1 times, and c code word is decoded correctly after iteration is performed for N +2 times, the required average iteration number N' of the M code words can be calculated according to the probability of b and c, and the bit depth storage amount required to be increased by the FIFO can be calculated according to the average iteration number. By analogy, according to the fact that different numbers of code words need different numbers of iteration times larger than N, the method can be used for obtaining the increasing amount of the FIFO depth in the data cache module in the high-speed LDPC decoder with multi-core scheduling.
The decoding method of the high-speed LDPC decoder with multi-core scheduling provided by the invention comprises the following steps: the word stream to be decoded firstly enters a data cache module, a multi-core scheduling module allocates words to be decoded with a single code word length to decoding cores which are in an idle state and complete decoding according to the working condition of rear-end parallel decoding cores, each decoding core firstly puts the words to be decoded into a soft information storage module after receiving the words to be decoded, before the maximum iteration number is not reached, decoding is carried out circularly through a variable node array module and a check node array module, each decoding result is stored in a decoding result storage module, the multi-core scheduling module checks whether the decoding core is completely decoded or not, if the decoding result is verified, the decoding result is output to the multi-core scheduling module, and the decoding cores are converted into the idle state; otherwise, the decoding core continues to carry out iterative decoding until reaching the set maximum iteration number, if the decoding is not correctly decoded at the moment, the decoding core is forcibly stopped to decode the current code word, the decoding result is output, and decoding failure information is fed back; the multi-core scheduling module sends the next word to be decoded to an idle decoding core for decoding; and after the decoding is finished, the multi-core scheduling module uniformly outputs the decoding results of the decoding cores to the data caching module according to the code word distribution sequence and the same sequence, and the data caching module outputs the decoded data.
The core of the invention is the comprehensive control scheduling of the multi-core scheduling module on the whole system, and the working signal of the multi-core scheduling module depends on whether each parallel decoding core completes decoding, namely a non-fixed iteration number decoding mechanism. The mechanism is represented in that if the multi-core scheduling module detects that a code word accords with a check relation after a certain decoding core is decoded, the decoding core immediately stops iteration to become an idle decoding core, and the scheduling module waits for the new code word to be allocated. The decoding result is checked by performing exclusive-or operation on positive and negative values of the output extrinsic information of each variable node i having a link relationship with the check node j, and taking the result of the exclusive-or operation as a check result, as shown in the following formula:
Figure BDA0001211649650000061
where i is a variable node connected to a check node j, the number of which is
Figure BDA0001211649650000062
qijRepresents the output extrinsic information of the variable node i linked with the check node j, check is the output result of the check, sgn () is a sign function, i.e., returns the positive and negative values of the output extrinsic information, 0 is positive and 1 is negative,
Figure BDA0001211649650000063
is an exclusive or operation; check is 0, which indicates that the check relation is satisfied; if all check nodes meet the check relation that check is 0, the code word is correctly decoded, or the code word is decoded into another code word in the code word space; both of the above two cases indicate that the decoding is completed, and the continuous iteration also has no effect on the decoding result, so that the iteration of the current codeword should be stopped, the decoding core is in an idle state, and the decoding of a new codeword can be started.
Aiming at the problem that the traditional multi-core scheduling can not fully utilize the decoding period, the invention improves the traditional scheduling method, and the specific method for the multi-core scheduling module to allocate the code words to be allocated is as follows: setting m LDPC parallel decoding cores, when the storage amount in an input FIFO reaches one code word, starting the multi-core scheduling module to work, firstly performing the primary distribution of the code words according to the sequence of the serial numbers of the decoding cores from small to large, and respectively sending the first m code words to the m decoding cores for decoding; after initial allocation, the multi-core scheduling module is always in a standby detection state, when the multi-core scheduling module needs to perform subsequent allocation of each code word, which decoding core is in an idle state is judged according to a result fed back by the decoding core, the number of the decoding core is recorded, the code word after decoding is temporarily stored, meanwhile, a word to be decoded is read from the front-end data cache module, the word to be decoded is allocated to the decoding core in the idle state for decoding, and then the multi-core scheduling module continuously enters the standby detection state to wait for the next decoding core to finish decoding; and then sequentially and respectively scheduling the subsequent code words to the decoding cores in the idle state for decoding (without waiting for the m decoding cores to finish decoding and then performing a new code word distribution, thereby fully utilizing the decoding time of each decoding core). And if the decoding is finished by the decoding cores at the same time, distributing the words to be decoded according to the sequence of the serial numbers of the decoding cores from small to large.
The application effects of the embodiment of the present invention are given below:
taking the LDPC code with a code length of 12288 bits and an information bit length of 10240 bits as an example, the decoding process has no maximum iteration times, and the condition of stopping decoding is correct decoding.
As shown in fig. 4, the number of iterations performed for each codeword when correctly decoding is shown, where the abscissa is the serial number of each codeword, and the ordinate is the number of iterations performed when the codeword is correctly decoded. In the figure, the maximum iteration number is 29, the minimum iteration morpheme is 12, the iteration number for decoding is very dense at 16, and only 0.2% of code words need to be iterated for the maximum number. In a conventional parallel decoder, in order to meet decoding requirements and to make a control process simple, a maximum iteration number is usually set for each decoding cycle, and each decoding cycle is set according to the maximum iteration number. For the present embodiment, the maximum iteration number is set to 29, so that only 0.2% of the codewords actually need to be iterated for 29 times, and 99.8% of the codewords do not need to reach the maximum iteration number, which causes a waste of the iteration number and thus a waste of hardware resources.
As shown in fig. 5, a graph of the relationship between the iteration number and the decoding performance is shown, where the abscissa is the maximum iteration number, and the ordinate is the error rate of the decoding result at the maximum iteration number. The increase in the number of iterations has a significant effect on the BER reduction when the signal-to-noise ratio is appropriate, and the BER decreases by approximately one order of magnitude for every two iterations. Therefore, by adopting a novel multi-core scheduling parallel decoding mode with non-fixed iteration times, the time saved by the code words with less iteration times required by decoding is distributed to the code words which need more iterations and can be correctly decoded, which is equivalent to increase the overall effective iteration times and reduce the average iteration times. From the viewpoint of decoding operand, the average iteration number is positively correlated with the decoder operating clock, that is, the average iteration number is small, the decoder operating clock is low, and the low clock rate is helpful for improving the stability of a hardware circuit, and the operand is reduced under the same decoding processing rate. As shown in fig. 6, the relationship between the maximum number of iterations and the saved computation amount is shown, where the abscissa is the maximum number of iterations and the ordinate is the percentage of the saved computation amount. In this case, the average number of iterations is reduced from 29 to 16, and the computation amount is reduced by 46.7%, so that on one hand, the hardware resources are more fully utilized compared with the traditional method, and on the other hand, the computation amount is greatly reduced, therefore, on the premise of reaching the same index, the invention obviously improves the overall efficiency of the system and saves the hardware resources.
The novel multi-core scheduling greatly improves the utilization rate of a decoding core part, and simultaneously introduces two extra expenses, wherein one expense is a check relation inspection module added to a check node, and the other expense is the depth increase of an input FIFO. The essence of the checking module is the exclusive or operation of all the sign bits of the input external information, which is expressed as a few LUT resources on hardware.
In the traditional mode, the maximum iteration number N is often set according to the actual data rate, the decoding performance index and the decoding rate supported by hardware, and the FIFO depth required to be increased by the novel multi-core scheduling mode is calculated on the basis of N. Suppose that a total of 10 codewords need to be decoded, of which 9 codewords are iterated N timesCorrectly decoding, namely correctly decoding 1 code word after iterating for N +1 times, wherein the probability that FIFO needs to deepen the bit depth is the probability of the 1 code word; if 8 code words are correctly decoded after iteration N times, 1 code word is correctly decoded after iteration N +1 times, and 1 code word is correctly decoded after iteration N +2 times, the probability that the FIFO needs to deepen the bit depth is the probability that iteration N +1 times and N +2 times occur each time. Since the time that a codeword occurs that requires more than N iterations varies, and the effect on FIFO depth also varies, the probability calculation also needs to take into account the order in which the codewords occur. Fig. 7 shows the calculated FIFO depth increase bit amount after each codeword is decoded, with the abscissa being the serial number of the codeword to be decoded and the ordinate being the number of bits to be decoded of the current FIFO buffer. It can be seen that when the maximum number of iterations is decreased from 29 to 16, the input FIFO depth only needs to be increased by two codewords, but the decoding performance is greatly improved, and the maximum number of buffered bits in fig. 7 does not exceed two codewords, so that the FIFO cannot be full in the whole decoding process when the FIFO depth is increased by two codewords. Whereas in fig. 6, if the number of iterations is increased from 16 to 29, the input FIFO is only increased by the depth of two codewords, and the error rate is from 10-4Down to a magnitude of 10-9And the FIFO depth is increased by an amount which is completely acceptable compared with the small LUT usage amount brought by the check relation checking module and the improvement of the decoding performance. Therefore, the invention greatly reduces the computation of the traditional method through very small hardware overhead, obviously improves the overall efficiency of the system and saves the overall hardware resources.

Claims (3)

1. A high-speed parallel low-density parity check decoder with multi-core scheduling is realized by adopting an FPGA chip and comprises a data cache module, a multi-core scheduling module and an LDPC parallel decoding core consisting of a plurality of decoding cores which are sequentially connected; the method is characterized in that: the data cache module is composed of an FIFO (first in first out) inside the FPGA, and the FIFO has deeper storage depth so as to ensure that enough cache space can be provided when the next code word arrives; the multi-core scheduling module is used for controlling the scheduling of the whole high-speed LDPC decoder, when the data amount of more than one code word is stored in the upper-stage data cache module, a decoding starting signal is sent to the later-stage parallel decoding core, the cache data is allocated to each decoding core for decoding, a decoding ending signal fed back by the later-stage parallel decoding core is received at the same time, whether each decoding core is in an idle state or not is checked, and the next word to be decoded in the upper-stage data cache module is sent to the decoding core in the idle state for decoding; each decoding core in the LDPC parallel decoding cores comprises a soft information storage module, a variable node array module, a control module, a check node array module, a variable node storage module, a check node storage module and a decoding result storage module, wherein the soft information storage module, the variable node array module and the control module are sequentially connected with one another; the variable node array module is also connected with the check node storage module and the variable node storage module:
the amount of depth increase for setting the FIFO is specified as follows: if M code words are required to be decoded, wherein a code word is decoded correctly after iteration is carried out for N times, b code word is decoded correctly after iteration is carried out for N +1 times, and c code word is decoded correctly after iteration is carried out for N +2 times, the required average iteration times N' of the M code words can be obtained by calculation according to the probabilities of b and c, and the bit depth storage amount required to be increased by FIFO can be calculated according to the average iteration times; by analogy, different numbers of iteration times larger than N are required according to different numbers of code words;
the specific method for the multi-core scheduling module to allocate the code words to be allocated is as follows: setting m LDPC parallel decoding cores, when the storage amount in an input FIFO reaches one code word, starting the multi-core scheduling module to work, firstly performing the primary distribution of the code words according to the sequence of the serial numbers of the decoding cores from small to large, and respectively sending the first m code words to the m decoding cores for decoding; after initial allocation, the multi-core scheduling module is always in a standby detection state, when the multi-core scheduling module needs to allocate each subsequent code word, which decoding core is in an idle state is judged according to a result fed back by the decoding core, the number of the decoding core is recorded, the code word of which decoding is completed is temporarily stored, meanwhile, a word to be decoded is read out from the data cache module, and is allocated to the decoding core in the idle state for decoding, and then the multi-core scheduling module continuously enters the standby detection state to wait for the next decoding core to complete decoding; sequentially scheduling the subsequent code words to the decoding cores in the idle state respectively for decoding; and if the decoding is finished by the decoding cores at the same time, distributing the words to be decoded according to the sequence of the serial numbers of the decoding cores from small to large.
2. A decoding method of a decoder according to claim 1, wherein a word stream to be decoded first enters a data cache module, a multi-core scheduling module allocates a word to be decoded of a single codeword length to a decoding core in an idle state after decoding is completed according to the working condition of a rear-end parallel decoding core, and after each decoding core receives the word to be decoded, the word is first put into a soft information storage module and decoded circularly through a variable node array module and a check node array module, each decoding result is stored in a decoding result storage module, the multi-core scheduling module checks whether the decoding core has been decoded or not, if the decoding result is verified, the decoding result is output to a multi-core scheduling module, and the decoding core is converted into the idle state; otherwise, the decoding core continues to carry out iterative decoding until reaching the set maximum iteration number, if the decoding is not finished at the moment, the decoding of the current code word by the decoding core is forcibly stopped, the decoding result is output, and decoding failure information is fed back; the multi-core scheduling module sends the next word to be decoded to an idle decoding core for decoding; and after the decoding is finished, the multi-core scheduling module uniformly outputs the decoding results of the decoding cores to the data caching module according to the code word distribution sequence and the same sequence, and the data caching module outputs the decoded data.
3. The decoding method according to claim 2, wherein the decoding result is checked by performing an exclusive or operation on positive and negative values of the output extrinsic information of each variable node i having a link relationship with the check node j, and using the result of the exclusive or operation as the check result, as shown in the following formula:
Figure FDA0002288613350000021
where i is a variable node connected to a check node j, the number of which is
Figure FDA0002288613350000023
qijRepresents the output extrinsic information of the variable node i linked with the check node j, check is the output result of the check, sgn () is a sign function, i.e., returns the positive and negative values of the output extrinsic information, 0 is positive and 1 is negative,
Figure FDA0002288613350000022
is an exclusive or operation; check is 0, which indicates that the check relation is satisfied; if all check nodes satisfy the check relation that check is 0, the code word is decoded correctly, or the code word is decoded into another code word in the code word space; both cases indicate that the decoding is completed and the decoding core is in an idle state.
CN201710031380.9A 2017-01-17 2017-01-17 High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof Active CN106911336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710031380.9A CN106911336B (en) 2017-01-17 2017-01-17 High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710031380.9A CN106911336B (en) 2017-01-17 2017-01-17 High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof

Publications (2)

Publication Number Publication Date
CN106911336A CN106911336A (en) 2017-06-30
CN106911336B true CN106911336B (en) 2020-07-07

Family

ID=59206495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710031380.9A Active CN106911336B (en) 2017-01-17 2017-01-17 High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof

Country Status (1)

Country Link
CN (1) CN106911336B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110739975B (en) * 2019-09-20 2021-06-11 华中科技大学 Variable node multiplexing method of semi-random decoder
CN111126044B (en) * 2019-12-24 2023-06-27 北京中安未来科技有限公司 VIN code multiple verification method and VIN code identification method and device using confidence
CN113300809B (en) * 2020-02-24 2022-08-16 大唐移动通信设备有限公司 Data processing method and device
EP4133631A4 (en) 2020-04-07 2023-09-27 Telefonaktiebolaget LM ERICSSON (PUBL) Network node and method for improved decoding in a network node
US20230353169A1 (en) * 2020-06-23 2023-11-02 Telefonaktiebolaget Lm Ericsson (Publ) Network node and method performed therein for handling received signal
CN113381769B (en) * 2021-06-25 2023-02-07 华中科技大学 Decoder based on FPGA

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1937413A (en) * 2006-09-30 2007-03-28 东南大学 Double-turbine structure low-density odd-even check code decoder
CN101854177A (en) * 2009-04-01 2010-10-06 中国科学院微电子研究所 High-throughput LDPC encoder
CN101977063A (en) * 2010-11-01 2011-02-16 西安空间无线电技术研究所 General LDPC decoder
CN102480336A (en) * 2010-11-30 2012-05-30 中国科学院微电子研究所 General rapid decoding coprocessor of quasi-cyclic low density parity check code
CN103618556A (en) * 2013-12-11 2014-03-05 北京理工大学 Partially parallel quasi-cyclic low-density parity-check (QC-LDPC) decoding method based on row message passing (RMP) scheduling
US9413390B1 (en) * 2014-07-16 2016-08-09 Xilinx, Inc. High throughput low-density parity-check (LDPC) decoder via rescheduling

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4138700B2 (en) * 2004-05-31 2008-08-27 株式会社東芝 Decoding device and decoding circuit
CN101350625B (en) * 2007-07-18 2011-08-31 北京泰美世纪科技有限公司 High-efficiency all-purpose decoder for QC-LDPC code and decoding method thereof
WO2009143375A2 (en) * 2008-05-21 2009-11-26 The Regents Of The University Of Calfornia Lower-complexity layered belief propagation deconding ldpc codes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1937413A (en) * 2006-09-30 2007-03-28 东南大学 Double-turbine structure low-density odd-even check code decoder
CN101854177A (en) * 2009-04-01 2010-10-06 中国科学院微电子研究所 High-throughput LDPC encoder
CN101977063A (en) * 2010-11-01 2011-02-16 西安空间无线电技术研究所 General LDPC decoder
CN102480336A (en) * 2010-11-30 2012-05-30 中国科学院微电子研究所 General rapid decoding coprocessor of quasi-cyclic low density parity check code
CN103618556A (en) * 2013-12-11 2014-03-05 北京理工大学 Partially parallel quasi-cyclic low-density parity-check (QC-LDPC) decoding method based on row message passing (RMP) scheduling
US9413390B1 (en) * 2014-07-16 2016-08-09 Xilinx, Inc. High throughput low-density parity-check (LDPC) decoder via rescheduling

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Improving the Belief-Propagation Convergence of Irregular LDPC Codes Using Column-Weight Based Scheduling";Chaudhry Adnan Aslam等;《IEEE Communications Letters》;20150611;第19卷(第9期);第1283页到第1286页 *
"一种新的LDPC译码器设计";王锦山等;《系统工程与电子技术》;20081031;第30卷(第10期);第2031页到第2035页 *
"基于多核处理器架构的LTE PUSCH信道解调译码并行处理设计";张自然等;《中兴通讯技术》;20090228;第15卷(第1期);第53页到第56页 *
"多元LDPC译码器的设计与实现";黎海涛等;《高技术通讯》;20131215;第23卷(第12期);第1299页到第1307页 *

Also Published As

Publication number Publication date
CN106911336A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106911336B (en) High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof
Lee et al. Speeding up distributed machine learning using codes
Wang et al. High throughput low latency LDPC decoding on GPU for SDR systems
KR100875836B1 (en) Instruction instruction compression apparatus and method for parallel processing BLU computer
US8667502B2 (en) Performing a local barrier operation
US20180165164A1 (en) Data Recovery Method, Data Storage Method, and Corresponding Apparatus and System
CN103916134B (en) Low-density parity check code aliasing and decoding method and multi-core collaborative aliasing decoder
CN108462496B (en) LDPC decoder based on random bit stream updating
CN111404555B (en) Cyclic shift network control method, system, storage medium and decoder
Shen et al. Low-latency segmented list-pruning software polar list decoder
CN115314121B (en) Quantum communication method and device and electronic equipment
JP6415556B2 (en) Method, apparatus, and computer program for allocating computing elements within a data receiving link (computing element allocation within a data receiving link)
CN103475378B (en) A kind of high-throughput ldpc decoder being applicable to optic communication
CN102201817A (en) Low-power-consumption LDPC (low density parity check) decoder based on optimization of folding structure of memorizer
US8756476B2 (en) Method and apparatus for decoding low-density parity-check codes
CN107707330A (en) SC LDPC codes decoding acceleration system based on GPU
CN105512087B (en) Reliability evaluation method of resource-constrained multi-node computing system
CN111367875B (en) Ticket file processing method, system, equipment and medium
CN113381769B (en) Decoder based on FPGA
CN116566402A (en) LDPC decoding method, circuit and electronic equipment
CN115276905B (en) Small bandwidth service transmission method based on FlexE transmission system
CN117498987B (en) Method applied to large-scale network data transmission
CN110048805B (en) Decoding control system and method for low density parity check code and wireless communication system
US11894863B2 (en) Method and apparatus for generating a decoding position control signal for decoding using polar codes
Lei et al. FPGA-Accelerated Erasure Coding Encoding in Ceph Based on an Efficient Layered Strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant