WO2007082626A2 - Method and apparatus for error correction decoding - Google Patents

Method and apparatus for error correction decoding Download PDF

Info

Publication number
WO2007082626A2
WO2007082626A2 PCT/EP2006/070101 EP2006070101W WO2007082626A2 WO 2007082626 A2 WO2007082626 A2 WO 2007082626A2 EP 2006070101 W EP2006070101 W EP 2006070101W WO 2007082626 A2 WO2007082626 A2 WO 2007082626A2
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
memory
decoding
information
accumulator
Prior art date
Application number
PCT/EP2006/070101
Other languages
French (fr)
Other versions
WO2007082626A3 (en
Inventor
Stefan Müller
Manuel Schreger
Marten Kabutz
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2007082626A2 publication Critical patent/WO2007082626A2/en
Publication of WO2007082626A3 publication Critical patent/WO2007082626A3/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/11Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits

Definitions

  • the present invention relates to Error Correction Codes, Soft Decision Decoding, Low Density Parity Check Codes (LDPC) , DVB- S2, irregular repeat accumulate codes, structured codes.
  • LDPC Low Density Parity Check Codes
  • LDPC Low Density Parity Check
  • x represent a received vector assumed to be an N -tuple of symbols from GF(2) .
  • a vector y is a valid code word if and only if
  • the Hamming distance is defined as , i.e. as the cardinality of the set of those vector components, where the two vectors to be compared differ.
  • each bit is represented by a conditional probability. Specifically, the probability that the bit / of a received vector was zero is denoted as p t . This probability is called the a-priori probability.
  • the bits of the received vector x can be represented by so- called Log-Likelihood-Ratio values or LLRs or likelihood values for short, defined as
  • This function maps the open interval to the entire set of real numbers % .
  • a number range of at least -30 ⁇ L(O) ⁇ 30 must be foreseen for integer-valued LLR values.
  • the a-priori information is called the channel information or channel LLR value and is denoted as
  • S is a dimension of sub matrices as explained in the following .
  • the parity matrix H is divided into identically sized, square submatrices.
  • the rows and columns are arranged into groups and are called check node groups and symbol node groups, respectively.
  • the resulting submatrices sometimes denoted as TZ 1 , define the connection between the nodes, and for structured codes they consist of permutation matrices.
  • permutation matrices are denoted as V to indicate an identity matrix / cyclically left shifted / times.
  • Some of the submatrices may as well consist of a sum of two permutation matrices, for which a shorthand notation of V + ⁇ is sometimes used, meaning I l +I J with i ⁇ j.
  • a parity matrix of a structured code may look like:
  • Tanner graph Another way of representing structured codes is the so-called Tanner graph.
  • a Tanner graph of a structured code is depicted in Figure 4.
  • each of the lines joining a symbol node group and a check node group stands for a number "S" of individual connections between individual check equations and individual symbols.
  • the LDPC decoding method and apparatus according to the invention reduce the overall size of memory and reduce the number of write accesses. Hence the power consumption is decreased. Additionally it converges faster, which also contributes to a power reduction.
  • the method and apparatus according to the invention require an edge memory of at least 612*360 LLR words. Hence the resulting reduction is approximately 22 percent. Also the write accesses are reduced to approximately 50%.
  • H denote the parity matrix of an LDPC code.
  • M denote the number of parities or parity equations or parity digits
  • K the number of information symbols or information digits
  • N the total number of symbols or digits in one code word.
  • the row and column weights of H are denoted as w r and w c , respectively.
  • the channel log likelihood ratios of received code word bits CW 1 are expressed as L 1 , where log likelihood ratio is defined as
  • L 1 is the received conditional probabi l ity that informat i on bit CW 1 i s z ero .
  • LDPC decoding uses check node update steps and symbol node update steps, but it does so in different order, which has consequences for the memory size and memory access needs.
  • F(n) ⁇ m denotes the set F(n) without one specific member m from amongst its member checks or check equations
  • C(m) ⁇ n denotes the set C(m) without a specific one n of its member variables or information digits or bits.
  • This computation goes horizontally through the matrix, it amounts to first reading the set of existing terms Q m ⁇ corresponding to a specific row m of the parity matrix H , next calculating a set of terms R mn therefrom, and then writing the set of terms R mn back to the same memory positions that held the Q m, , before.
  • the calculation amounts to first reading the set of existing terms R 1n corresponding to one specific column n of the parity check matrix H , next calculating a set of terms Q mn therefrom, and then writing the set of terms Q mn back to the same memory positions that held the R m, « before.
  • the Horizontal step is performed for all rows of the parity check matrix, and after that the Vertical step is performed for all columns of the matrix.
  • the memory requirement in this case is proportional to the number of non-zero entries in the parity check matrix, hence if we denote as w c the column weight of the j -th column of the parity check matrix, the overall memory consumption can be calculated as
  • the number of memory read and write accesses during the check node updates are w r M + N and w r M , respectively.
  • the number of memory read and write accesses is w r M + N and w r M , respectively, so the total number of read and write accesses is 2(w r M + N) and 2w r M , respectively.
  • the idea of the invention is to interpret each check equation in the parity matrix and associated likelihood matrix independently as a turbo code and to apply directly the Turbo Decoding principle.
  • a decoding method according to the invention which employs, in iteration, likelihood values in a memory associated to nonzero elements of the parity check matrix of an LDPC code, and where a check node update step generating check node updated likelihood values is performed for consecutive rows of the parity check matrix, before performing the check node update step for a row of the parity check matrix, symbol node update steps are performed in columns where the row of the parity check matrix has set elements, and in these symbol node update steps the check node updated likelihood values are used from the memory associated with matrix rows above the current row whereas likelihood values from a previous iteration are used from the memory associated with the matrix rows below the current row.
  • This has the advantage of a faster convergence.
  • the symbol node update steps are performed immediately before consecutive ones of the check node update steps; this maximises the convergence speedup. DECODING PRINCIPLE A
  • the check node updated likelihood values from any one of the check node update steps are stored using those memory positions that are associated with set elements of the respective next or subsequent row of the parity check matrix.
  • Decoding Principle A described above constitutes a way to reduce the overall memory consumption employing what can loosely be called a "turbo decoding principle". It is applicable where the sub-matrices of the parity matrix of a structured code all are permutation matrices. However, the parity matrix of the DVB-S2 standard has some sub-matrices which are sums of two permutation matrices. Categorically using the above described Turbo Decoding principle results in problems on those check equation groups where a sub-matrix consists of two or more permutation matrices.
  • BNG 1 is the i-th Bit Node Group of variables or information digits ⁇ , s ,-- ⁇ ( , + i )S -iJ and CNG 1 is the i-th Check Node Group of check digits ⁇ c lS ,..C 0+1 ⁇ -1 J.
  • Each of the sub-matrices joining or relating a Check Node Group to a Bit Node Group consists of one or more cyclic left shifted identity matrices V , where y identifies the number of shifts.
  • L 1J is the extrinsic information, due to the memory allocation described above, / can be seen as the address and j as the j -th LLR value at that address.
  • the position of the non-zero entries of the sub-matrix for BNG 0 and CNG 0 is defined by two permutation matrices, whereas all other shown sub-matrices of H are each defined by a single permutation matrix of individual shift.
  • the idea of Decoding Principle A above was to interpret each equation as one independent turbo code. That means, for updating one equation the intrinsic information for that code is obtained by adding together the channel LLR values and the extrinsic information of the other "decoders". The result of one decoding step is saved on the location of the next unused information set, this reduces the overall memory size.
  • the buffer for BNG 0 looks like
  • t e depth of the buffer corresponds to the column weight minus one, as mentioned above.
  • the decoding algorithm is extended.
  • the extension consists in not using the output of C 0 in the other decoder, C 1 . That means that instead of computing Eqq. (B3) and (B4), one computes
  • the first input values namely check equation C 0 and C 1
  • the first input values are the same.
  • the data flow of extrinsic information from C 0 to C 1 is avoided, and the extrinsic information from C 0 flows to C 1 via C 4 and C 9 and so on.
  • an additional buffer is introduced for those locations where one sub-matrix is identified via two or more permutation matrices. With that additional buffer it is possible to use the normal two phase message passing approach on those locations. With this extension, we have a bit accurate implementation of the Tanner graph; while the performance is still increased compared against the known two phase message passing method.
  • the idea here is to mix the turbo decoding principle with the two phase message passing principle. Submatrices which are defined via one permutation matrix are decoded via the described Turbo Decoding principle and submatrices which are defined via two or more permutation matrices are decoded using the known Two Phase Message Passing. Of course, overall memory size is slightly increased according to the number of sub-matrices identified via two or more permutation matrices.
  • method B is applicable to any LDPC decoding where at least some of the submatrices are defined by more than one permutation matrix.
  • Decoding Principle B The decoding architecture of Decoding Principle B can be seen as being less suitable for applications requiring a high data throughput. The following therefore desribes another extension into a Decoding Principle C that is especially suited for high throughput applications.
  • a high number of memory read accesses can be seen as a disadvantage of Decoding Principle B, so its architecture is changed in order to make it applicable for high throughput applications.
  • the idea of Decoding Principle C is to reduce the high number of read accesses that occur when mixing the so-called "TPMP" and "TDMP".
  • TPMP so-called "TDMP"
  • TDMP TDMP
  • an accumulator is used. It is initialized with the associated channel information. Before overwriting the buffer with the decoding results the decoder result is added to the accumulator and the buffer contents of the next check node group is subtracted. After that the decoding result is saved on the buffer location of the next check node group.
  • Decoding Principle C results in the same performance and same computation as Decoding Principle B, but the buffer handling and timing scheme is different. Therefore it requires more memory, albeit less than conventional two phase message passing.
  • the main advantage is the fast convergence and the high throughput .
  • an accumulator is used, the accumulator is initialized with the associated channel information, the check node updated likelihood values are added to the accumulator and the contents at the memory positions associated with set elements of the subsequent row of the parity check matrix is subtracted from the accumulator, and then the check node updated likelihood values are stored to the memory positions associated with set elements of the subsequent row of the parity check matrix .
  • Decoding Principle D is to apply the turbo decoding principle to single steps of the known "Two Phase Message Passing" or "TPMP" decoding approach without a reduction of memory.
  • Decoding Principle C is applicable for high speed applications and achieves some memory reduction compared to a straightforward TPMP implementation. But it can be seen to have the disadvantage of being applicable only for structured LDPC codes where some submatrices of the Parity Check Matrix are defined via one permutation matrix and others via more than one permutation matrix.
  • Decoding Principle D it has been found that the idea of the turbo decoding principle can, on its own, be advantageously applied within the Two Phase Message Passing (TPMP) approach. The basic idea is to transfer extrinsic information from one equation to the next equation. Decoding Principle D does not aim at memory reduction, but it can advantageously be applied to every LDPC code, whereas Decoding Principle C is only applicable for structured codes where sub-matrices consist of one or more permutation matrices.
  • TPMP Two Phase Message Passing
  • Decoding Principle D realises the same computation as Decoding Principles B and C. For each bit an accumulator is used for which a buffer is necessary.
  • the main advantage is, that is applicable for every LDPC code, but it requires more hardware and memory than B and C.
  • accumulators In order to avoid recomputing the intrinsic or input information to be used in the calculations of the next check node group, accumulators together with accumulator buffers are used.
  • a method for decoding structured LDPC coded data according to Decoding Principle D performs a decoding subiteration for each of the check node groups, computes for each Check Node Group new extrinsic informations using the accumulator values and information from the edge memory, computes the intrinsic information of each Check Node Group for each sub-matrix by subtracting old extrinsic information associated with the permutation matrices constituting the sub-matrix from the accumulator, and computes an a-posteriori information by adding the new extrinsic information to the accumulator .
  • Decoding Principle E applies the turbo decoding principle to the TPMP basis in a different way and without increasing the complexity of the system. Even so, it increases the convergence speed compared to TPMP and decreases the overall memory consumption.
  • the sizes of all three matrices depend on the code configuration of the LDPC code.
  • the row weight w r of H is constant in one code configuration and depends on the code configuration (number of ones in column) .
  • the column weight of ⁇ (1) is w c and also depends on the code configuration.
  • the matrix T is referred to as the Triangular part of H .
  • the turbo decoding principle as described above i.e. performing symbol node update steps immediately before performing the check node update step, is only applied on T , i.e. on those columns that belong to T.
  • the column degree of T is 2.
  • this parity check matrix for decoding can be visualised with a Log-Likelihood matrix or LLR matrix where one LLR value is associated to every set element of the party check matrix.
  • LLR matrix is also called ⁇ here and in the following. It is clear from the respective context, whether ⁇ denotes the parity check matrix proper or the LLR matrix .
  • the first column group SNG 0 represents the matrix part A (X)
  • the second column group SNG 1 represents the matrix part A (2)
  • this example has a different dimension than the code of DVB-S2.
  • the submatrix denoted as 7 1* is a special structure, namely
  • a method for decoding structured LDPC coded data performs the symbol node update steps immediately before consecutive ones of the check node update steps only in contiguous columns of the parity check matrix that have column weight 2, uses a single buffer for each of the contiguous columns, computes the intrinsic information anew for each row of the parity check matrix by column-wise accumulating the a-priori information and the extrinsic information, and by writing the new resulting extrinsic information over the old extrinsic information, and uses the new resulting extrinsic information in the next check node update step .
  • Decoding Principle F a principle similar to turbo decoding is applied on nodes wherever possible, and a principle similar to two phase message passing is applied on nodes in those submatrices that are defined via two or more permutation matrices. The way this is done results in a straightforward timing scheme. It also allows a portioning of the computation cores and therefore simplifies the routing inside an integrated circuit to realize the Decoding Principle.
  • Decoding Principle F has at least the following advantages :
  • the check processing units can be located at the memories;
  • NB Here, Eq. (Fl) assumes the permutation matrix I 1 to be defined as a cyclic right shift.
  • NB2 Eq. (Fl) and following, illustrating the case of binary symbols, use "Bit Node Group” (BNG) as a synonym for "Symbol Node Group” (SNG) .
  • BNG Bit Node Group
  • SNG Symbol Node Group
  • the edge memory layout results as:
  • ⁇ the first element at each address of the edge memory corresponds to the first sub equation of the associated bit node group
  • the LLR values corresponding to the parity check matrix are ordered row wise, i.e. the i-th element at any address contains the LLR value in the i-th row of the permutation matrix associated to the address.
  • Decoding Principle F amounts to mixing TPMP and TDMP using a- posteriori summation.
  • the Decoding Principle F is applicable for all LDPC codes, even for those where some submatrices of the parity check matrix are defined by more than one permutation matrix.
  • a method for decoding structured LDPC coded data according to Decoding Principle F uses an edge memory organised in words associated to the permutation matrices constituting the submatrices of the parity check matrix, each edge memory word comprising likelihood elements to store one likelihood value for every bit node of a bit node group.
  • the likelihood corresponding to the first sub equation of the associated bit node group is stored, in the i-th likelihood element, the likelihood associated to the i-th row of the permutation matrix associated to the edge memory word is stored, and during computing the intrinsic information, a cyclic shift of likelihood values is performed at the accumulator.
  • Fig. 1 shows an example decoder having 3 check equations
  • Fig . 2 shows the structure of the parity matrix of DVB-S2
  • Fig . 3 shows an architecture for Turbo decoding according to the invention for the case of unstructured code
  • Fig . 4 shows a Tanner graph of a structured LDPC code
  • Fig . 5 shows decoding according to the invention for the case of structured code
  • Fig . 6 shows a principle diagram of computing the extrinsic information
  • Fig . 7 shows the buffer contents at successive iterations and subiterations
  • Fig . 8 shows the buffer contents for Decoding Principe C at the times of iteration and sub iteration
  • Fig . 9 shows the accumulator values in the iteration and sub- iteration steps
  • Fig . 10 shows the final decision values at the end of the iteration
  • Fig . 11 shows the block diagram of an LDPC decoder hardware according to Decoding Principle E;
  • Fig . 12 shows an example of a memory initialization
  • Fig. 13 shows a suggested hardware architecture for Decoding
  • Fig. 14 shows accumulator memory content for an example of
  • Fig. 15 shows edge memory content for an example of Decoding
  • Fig. 1 shows 3 check equations corresponding to 3 rows of a parity check matrix. All of Lc, Li, LeO, LeI and Le2 are sets of LLR values. In conventional decoding, first all check equations are decoded or evaluated for instance using dedicated Check Processing Units also denoted as CPUs. The resulting extrinsic information Le (i) is saved in a so-called edge memory. This corresponds to "step I" as described above. Afterwards the accumulation is performed and the new intrinsic informations Li are updated and saved again in the edge memory, corresponding to step II. Hence an edge buffer is needed for sets LeO, LeI and Le2, corresponding to a total of three buffers.
  • this results in less memory and in fewer write accesses to the buffers.
  • the algorithm converges faster than conventional decoding, which also contributes to decreasing the number of buffer write accesses.
  • the decoding is best explained with an example:
  • N 5 memory buffers of size ⁇ 3,2,2,2,2 ⁇ denoted by P 0 ,P 1 ,P 2 ,...,P 4 are needed for the example of Eq. (4), their content is initialized to zero.
  • P' shall denote the i -th element of P ⁇ .
  • N 5 memory buffers
  • the memory requirement depends on the number of ones or non-zero entries per column of the parity check matrix H . If, in the parity matrix, we denote as w c the column weight of the j -th column, the overall memory consumption can be calculated as
  • the number of memory read accesses per subiteration is w c w r .
  • the number of write accesses per subiteration isw r . Since each iteration consists of M subiterations, the total number of read and write accesses are w c w r M and w r M , respectively. This compares to the total number of memory read and write accesses in the conventional decoding which is 2(w r M + N) and 2w r M , respectively.
  • Figure 2 shows the structure of the parity matrix of DVB-S2.
  • FIG. 3 shows a Hardware architecture of decoding according to the invention for the case of unstructured code.
  • the memory holds the channel LLR values and the extrinsic 5 information.
  • the edge configuration memory holds the parity check matrix configuration, it constitutes or realises the sparse parity check matrix into a compact memory.
  • the accumulator is simply an adder, which accumulates extrinsic information and can be reset by the controller.
  • the hard decision unit performs the hard decision according to the sign of the LLR value.
  • the CPU (check processing unit) unit computes the extrinsic information; it can be implemented using tanh and its approximation or using BCJR [3] /MAP/log-MAP based on trellis
  • N memory buffers are denoted as P 0 ,P 1 ,P 2 ,...,P N-1 and are initialized to zero.
  • P 1 denotes the /-th element of P 1 .
  • Fig 4 depicts the Tanner graph of such code.
  • Such structured codes can also be expressed via a parity matrix
  • SNG 0 consists of the bits C 0 ,C 1 ,C 2 ,C 3 .
  • each submatrix of H is of size 4x4.
  • word 0 has to be cyclically shifted right one symbol and has to be combined with word ⁇ , which is cyclically shifted right by two symbols.
  • S 4 check node update units, they are shifted back and saved in memory. The words are shifted, combined, shifted back and saved in memory.
  • This procedure can also be performed sequentially due to fact that the computation in CPU is associative.
  • Decoders for structured or architecture aware codes are easily obtained from those of normal decoding using additional interleavers and deinterleavers .
  • Fig 5 depicts an appropriate hardware for decoding structured code using the decoding according to the invention.
  • the decoding algorithm is:
  • the size of the memory buffer corresponds to the column weight minus one.
  • the word width corresponds to the submatrix size S .
  • the resulting bit-width of one word is S *nb_bits_per_LLR .
  • the memory buffers can be in one physical memory .
  • At iteration k Carry out decoding sub-iterations according to the number of check node groups. o Perform sub-iteration t : Compute for each check node group the extrinsic informations Q n using the channel information L 1 and the extrinsic information saved in P ] :
  • L tj denoting the / -th LLR value of the extrinsic information stored at address / .
  • the overall memory consumption in this example is 7*S LLR values.
  • k denotes the time or iteration step.
  • Phase (I) the new input values for Phase (I) of the equation for C 0 and Bit Node Group BNG 0 is:
  • the depth of the memory buffer corresponds to the column weight minus one.
  • N 2 memory buffers of size ⁇ 3,3 ⁇ denoted by P 0 ⁇ 1 and initialized to zero.
  • P ; w denotes the z ' -th element of P .
  • t designates the sub-iteration
  • k again designates the iteration step.
  • the overall memory consumption in this example is 6* S LLR values.
  • the memory reduction in that simple example is one memory word, but it can be very large, if less additional buffers are introduced.
  • the size of each sub-matrix H(m,n) is SxS .
  • extrinsic information saved in P ; or B/ For this we have first to compute the intrinsic or input information of each check node group for each sub- matrix, we denote this information as L x . During computation of L x it has to be distinguished if the extrinsic information of L x belongs to a sub-matrix with one or more than one permutation matrices. Perform for each entry in the check node group the following computation.
  • L x is computed as
  • L x is computed by using the second step of the two phase message passing.
  • the intrinsic information of the check equations is obtained.
  • the information is permuted according to the permutation matrices. This is accomplished by cyclically shifting the L x values, the resulting values are denoted as L * X ⁇ .
  • Fig. 6 shows a principle diagram of computing the extrinsic information, i.e. the principle of this concatenation. After computation of the tanh function we get the new extrinsic information. The values are shifted back according to the sub- matrix permutation offset and saved in memory. Example shifting first transposed column of above equation yields to our new extrinsic information of L 01 (in this case no shifting is necessary) .
  • L 0 is saved twice, namely in p ⁇ cntr ⁇ >) anc ⁇ j_ n B 0 ( ⁇ ) ; L 1 is saved in n ((cntrr ) +l)mod(w r — l)) ⁇ , ⁇ ⁇ ⁇ ,
  • Fig. 7 shows the buffer contents at successive iterations and subiterations.
  • P_0 ⁇ 0 stands for P o (o) , and so on.
  • the shaded table cells with bold font inscription symbolize the pointer positions of the counters. These positions are overwritten in the next sub iteration step.
  • the accumulator values are
  • the sub-matrix joining CNG 0 and BNG 0 is defined via two permutation matrices, therefore we have to add the additional buffer contents to the accumulator group A BNG0 and have to compute it with two phase message approach.
  • CNG 0 and BNG 1 The intrinsic information from CNG 0 and BNG 1 Is computed according to the turbo decoding principle:
  • Accumulator Buffer A BNG doesn't have to be prepared because the bit node group is defined via one permutation.
  • Fig. 8 shows the buffer contents for Decoding Principle C at the times of iteration and sub iteration. Same as in Fig. 7, the shaded boxes with the bold type inscriptions symbolize the pointer positions of the counters. These positions are overwritten in the next sub iteration step.
  • Fig. 9 shows the accumulator values in the iteration and sub- iteration steps.
  • Decoding Principle C is applicable to decoding apparatus and method of any LDPC code where at least some of the submatrices are defined by more than one permutation matrix.
  • each set of accumulator an arithmetic-logic unit
  • associated accumulator buffer a memory cell
  • Decoding Principle D same as for Decoding Principle C, an accumulator is introduced for each received or data bit, and the accumulator value is saved in an Accumulator Buffer A n .
  • the accumulator is initialized with the associated channel information :
  • K K' (D3) and the edge memories P are initialized with zero.
  • the Log-Likelihood-Ratios in an n-th Bit Node Group BNG n are denoted as a Channel Information Group L BNG , for which the associated accumulators are grouped into a corresponding Accumulator Group denoted as A BNG .
  • the accumulator buffer can be the a- priori/channel buffer L c .
  • the accumulator contains always the a- posteriori information.
  • the input information of the check equations is obtained.
  • the information is permuted according to the permutation matrices. This is accomplished by cyclically shifting the L x values, the resulting values are denoted as L * X ⁇ .
  • o Compute the a-posterior value by adding the decoding result to the accumulator. For this it must be distinguished if the extrinsic information belongs to one or to more permutation matrices. ⁇ In the case of one permutation matrix:
  • the Decoding Principle p_ is applicable to every LDPC code, even to unstructured ones, where it is simpler.
  • the edge memory layout is:
  • One memory place for storing an LLR value is associated to each non-null entry in the Parity Check matrix, i.e. overall memory consumption is ⁇ 13- S) LLR values.
  • time step k is advanced by one.
  • the overall memory consumption in this example using conventional decoding is (l0 ⁇ ⁇ 1 ) LLR values.
  • iVA, 4 and
  • this initialization yields the memory contents as :
  • Reading address 1 and 4 do exactly the same for L 1 "" 1 .
  • the handling for address 7 and 11 is different.
  • the memory contains, at address 7, always the extrinsic value for the next check node update of that column.
  • L mtr 4 ,8 directly corresponds to Eq. (E27) .
  • the vertical process of the matrix part T is included in the horizontal process.
  • the resulting LLR word is
  • L 00 has to be combined with L 03 , with L 04 and with L 08 .
  • the first value of L 0 and the first value of L 1 are the correct values, which are to be combined. Now the whole vector can be computed at once. Hence all four equations, i.e. the new extrinsic information of the whole check node group, can be updated in parallel with
  • the first column corresponds to the computation of above Eq. (E33) . Cyclically shift back the extrinsic values
  • FIG 11 shows the block diagram of an LDPC decoder hardware according to Decoding Principle E.
  • the data is saved in input memory "DEINT”.
  • the LDPC decoder is started via start signal "START Decoder", if one frame is completely obtained in the "DEINT” buffer.
  • the iteration control starts the individual processes of the LDPC decoder.
  • the processes consists of three phases, namely initialization, decoding and output phase.
  • the initialization phase is done by the block "load_cont”
  • the decoding is done by the "run_cont” and "bpu_cont” blocks
  • the output phase is done by the block "bit_stream” .
  • the iteration control block sets up the corresponding data path via multiplexer: MUXl, DEMUX3, MUX4, MUX7, MUX6.
  • load_cont has control of the input memory and the edge memory.
  • the iteration control block sends a start pulse to the block "load_cont”.
  • the load_cont block reads the data contents of the input memory and writes the data to the edge memory.
  • the parity check matrix of the LDPC code can be split into
  • depends on the code configuration and code rate and represents the number of check node groups.
  • the matrix A can be further split into two matrices
  • i in A 1 is an index and not a "power-of" operation.
  • the edge memory is initialized in a column oriented way. That means, that for each column the corresponding channel LLR value is written w r times into successive memory locations in the edge memory. This is done for A 1 and A 2 .
  • a conventional LDPC decoder would also initialize the matrix T .
  • E_ the turbo decoding principle is used on these nodes, so that the memory necessary for that part of the matrix is reduced by a factor of 2 and is initialized with zero. This reduces the overall memory consumption of the LDPC decoder.
  • the memory consumption necessary for the edges are 720x2160Bits, a conventional decoder needs 792x2160Bits .
  • BNG in the figure means symbol or bit node group and denotes the actual column group of the parity matrix.
  • CNG means check node group and denotes the row group of the parity matrix.
  • N Al bitgrp ⁇ and
  • N A ⁇ bitgrp2 denote the start of the next part of the parity check matrix.
  • the even addresses correspond to the edge memory 1 and the odd addresses correspond to the edge memory 2.
  • This code configuration requires 600 edges; a conventional decoder would require 600 memory locations.
  • the new approach needs only 540 memory locations, without loosing any edges, because of exploiting that one memory location is used for two edges.
  • the block "load_cont" sends an acknowledge back, indicating to the iteration control block, that decoder is ready for decoding.
  • the decoding process comprises a check node update and a bit node update.
  • run_cont has control of the input memory, edge memory, adder, shifter and cpu/bpu units.
  • the iteration control block sends a start pulse to the block "run_cont”.
  • the block updates now the intrinsic information in the edge memory with the extrinsic information. This is done in sequential manner by reading and writing check node group by check node group.
  • check node group 0 (CNG 0) comprises the addresses 0, 157, 480, 539.
  • the block reads the word of address 0 (edge memory 1) feeds the value through the shifter to the check processing units (CPU) , together with a synchronization signal, indicating start of check node group. Then it feeds the check processing units with word of address 157 (edge memory 2) .
  • check node group 1 comprises the addresses 13, 158, 480, 481.
  • the address of 480 is in the T part of the matrix.
  • the channel value of bit node group 120 is read from input memory and the previous written word of address 480 is read from memory and added together.
  • the block "run_cont" reads an internal ROM memory, containing information of edge configuration and timing information of memories, shifter and adder. Due to the partition of the memory it is possible to read one memory, while the other is being written and vice versa.
  • the block "bpu_cont” has control of the input memory, edge memory and cpu/bpu units.
  • the iteration control block sends a start pulse to the block "bpu_cont”.
  • the block updates now the extrinsic information in the edge memory with the intrinsic information. This is done in sequential manner by reading and writing bit/symbol node group by bit/symbol node group.
  • bit/symbol node group 0 (BNG 0) comprises the addresses 0, 1, ..., 12.
  • the block reads the word of address 0 (edge memory 1) feeds the value to the bit/symbol processing units (BPU), together with a synchronization signal, indicating start of bit node group.
  • the BPU computes the a-posteriori and the intrinsic value.
  • the a-posteriori value is used by the block "bit_stream” .
  • the intrinsic values are saved on the same location where extrinsic value was located before, hence addresses 0, 1, ...,12.
  • Matrix part T is omitted.
  • the bit decisions are available at the output of the BPU units. If the block "iter_control" decides to stop the iteration, it sends a pulse on the signal "start output” before the last BPU is started. This causes the decisions during the last BPU process to be saved in the output buffer and sent to output.
  • a « L c n ' ( F3 ) and the edge memory is initialized with zero.
  • the accumulator group for channel informations of the subvector L BNG of bit node group n is denoted as A BNG .
  • the accumulator buffer can be the a- priori/channel buffer L c .
  • the accumulators contain always the a-posteriori informations.
  • decoding sub-iterations corresponding to the number of check node groups are carried out:
  • the a-posteriori information is available after each sub-iteration and corresponds to the accumulator buffer values A BNG . Due to the fact that a-posteriori sum is updated during each subiteration, the decoding process can be stopped at any time.
  • Figure 13 shows a suggested hardware architecture for Decoding Principle F.
  • the accumulator memory contains the a-posteriori sum and is initialized with the channel LLR values according to Eq. (F3) . These values are updated at each subiteration . Therefore the accumulator values are loaded to the shifter and shifted according to the associated permutation matrices according to Eq. (F4) .
  • the corresponding extrinsic information is loaded from a edge memory which can be implemented as a FIFO, where read underflows are treated with zero reading.
  • the extrinsic information is subtracted from the shifted a- posteriori value according to Eq. (F5) .
  • the resulting intrinsic value is fed to the CNU unit and to a demultiplexer.
  • the intrinsic value from demultiplexer is fed to the FIFO. If the submatrix consists of more than one permutation matrix, the first intrinsic value is fed to the FIFO and afterwards the FIFO is fed with the corresponding negative extrinsic values for that submatrix, this prepares the computation inside the brackets of Eq. (FlO) . Now the CNU computes the new extrinsic values according to Eq. (F6) . These values are saved in the edge memory. At the output of the CNU the new extrinsic value is added to the output of the FIFO and passed to the shifter according to Eq. (F7) . The output of the shifter is fed to an adder.
  • one or more permutation matrizes in one submatrix. If the submatrix is defined via one permutation matrix, the value output value of the shifter is directly loaded to the a-posteriori register (no adding) . If the submatrix is defined via more than one permutation matrix, the first value is loaded to the a-posterior register and the following extrinsic update values are added to that value, as in Eq. (FlO) . The old a-posterior sum in the accumulator memory is overwritten with the new a-posterior value in the a-posterior register .
  • Figures 14 and 15 show the associated accumulator memory content and edge memory content, respectively .
  • Figure 16 shows inputs and outputs of key elements of the Decoding Principle F hardware. As one can see, in order to update the sum for bit 0 one has to compute

Landscapes

  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Error Detection And Correction (AREA)

Abstract

Methods and apparatuses are disclosed for decoding LDPC coded data, employing in iteration likelihood values in a memory, check node update steps and symbol node update steps. For accelerating convergence, before performing the check node update, symbol node update steps are performed in which updated likelihood values are used in matrix rows above and likelihood values from a previous iteration are used in matrix rows below. For efficiently decoding structured LDPC coded data, an accumulator is employed for each symbol node, new extrinsic information is computed using the accumulator and the memory, intrinsic information is computed by subtracting old extrinsic information from the accumulator, and a-posteriori information is computed by adding the new extrinsic information to the accumulator. Cyclic shifting is performed at the accumulator.

Description

Method and apparatus for error correction decoding
Field of the invention
The present invention relates to Error Correction Codes, Soft Decision Decoding, Low Density Parity Check Codes (LDPC) , DVB- S2, irregular repeat accumulate codes, structured codes.
Background of the invention
Low Density Parity Check (LDPC) codes are the most powerful codes today; they achieve a very good error correction performance, especially for long codes.
To describe decoding of linear codes, let x represent a received vector assumed to be an N -tuple of symbols from GF(2) . A vector y is a valid code word if and only if
Figure imgf000002_0002
where is called the Parity Check Matrix. The goal of decoding is to derive a vector y from each vector x , which minimizes the
Hamming distance dnammin%{x,y) between x and y. The Hamming distance is defined as
Figure imgf000002_0001
, i.e. as the cardinality of the set of those vector components, where the two vectors to be compared differ.
In soft decision decoding, each bit is represented by a conditional probability. Specifically, the probability that the bit / of a received vector was zero is denoted as pt . This probability is called the a-priori probability. Advantageously the bits of the received vector x can be represented by so- called Log-Likelihood-Ratio values or LLRs or likelihood values for short, defined as
Figure imgf000003_0001
This function maps the open interval to the entire set of
Figure imgf000003_0005
real numbers % . In order to be able to represent probabilities p in a range a number range of at least
Figure imgf000003_0003
-30<L(O)<30 must be foreseen for integer-valued LLR values. In the following, the a-priori information is called the channel information or channel LLR value and is denoted as
where i denotes the bit number within the received vector. Additionally we define a sub vector Lc(n) as
Figure imgf000003_0002
where S is a dimension of sub matrices as explained in the following .
The principle of Structured Codes
For structured codes, the parity matrix H is divided into identically sized, square submatrices. The rows and columns are arranged into groups and are called check node groups and symbol node groups, respectively. In DVB-S2, submatrix dimension is 5=360, so that one check node group consists of 360 check equations and one symbol node group consists of 360 columns. The resulting submatrices, sometimes denoted as TZ1 , define the connection between the nodes, and for structured codes they consist of permutation matrices. In case of DVB-S2, permutation matrices are denoted as V to indicate an identity matrix / cyclically left shifted / times. Some of the submatrices may as well consist of a sum of two permutation matrices, for which a shorthand notation of V+} is sometimes used, meaning Il+IJ with i≠j. Examples of permutation matrices of size S =4 are:
Figure imgf000004_0001
A parity matrix of a structured code may look like:
Figure imgf000004_0002
Another way of representing structured codes is the so-called Tanner graph. A Tanner graph of a structured code is depicted in Figure 4. In the Figure, each of the lines joining a symbol node group and a check node group stands for a number "S" of individual connections between individual check equations and individual symbols.
The idea behind structured codes is that this special structure of dependencies between the check nodes also called parity nodes and the symbol nodes also called bit nodes allows to update a certain number of parity and symbol nodes simultaneously. The result is a partially parallel hardware. The resulting hardware is a good compromise of memory and complexity. With the structured code approach, long codes can be realised with moderate complexity.
Implementation of long LDPC codes in hardware often becomes the bottleneck of a design. The memory consumption of a conventional LDPC decoding is quite high and it requires a lot of memory accesses. The high number of write accesses yields high power consumption. Even if a structured code is used, the power consumption is often quite high and proportional to the number of iterations.
Invention
The LDPC decoding method and apparatus according to the invention reduce the overall size of memory and reduce the number of write accesses. Hence the power consumption is decreased. Additionally it converges faster, which also contributes to a power reduction.
For an example where a known LDPC decoder for DVB-S2 requires at least a so-called "edge memory" of size 792*360 Log-Likelihood- Ratio or LLR words, the method and apparatus according to the invention require an edge memory of at least 612*360 LLR words. Hence the resulting reduction is approximately 22 percent. Also the write accesses are reduced to approximately 50%.
Let H denote the parity matrix of an LDPC code. Let M denote the number of parities or parity equations or parity digits, K the number of information symbols or information digits and N the total number of symbols or digits in one code word. The row and column weights of H are denoted as wr and wc, respectively.
The channel log likelihood ratios of received code word bits CW1 are expressed as L1 , where log likelihood ratio is defined as
L1 is the received conditional
Figure imgf000005_0001
probabi l ity that informat i on bit CW1 i s z ero .
Same as the known "Two Phase Message Passing" or TPMP decoding method, LDPC decoding according to the invention uses check node update steps and symbol node update steps, but it does so in different order, which has consequences for the memory size and memory access needs.
Let
Figure imgf000006_0001
≠ 0/ for the associated likelihood matrix, be the set of checks that are connected to variable n = 0,...,N -I , with other words the set of positions of those check equations that depend (among others) on a specific information digit or bit n = 0,...,N -I . Correspondingly, let C(m)= \n : Hmn =1/, corresponding to C(m) = ψ : Hmn ≠ 0/ for the likelihood matrix, be the set of variables that are connected to check m = 0,..,M-l, or with other words the set of positions of those information digits or bits that have influence on a specific check equation m = 0,..,M -1. Also, according to usual mathematical convention, F(n)\m denotes the set F(n) without one specific member m from amongst its member checks or check equations, and correspondingly, C(m)\n denotes the set C(m) without a specific one n of its member variables or information digits or bits.
Check node update or Horizontal step
For each check node m = 0,...,M -1 and for each neC(m), with other words for the non-zero positions of rows of the partity check matrix, compute
Figure imgf000006_0002
This computation goes horizontally through the matrix, it amounts to first reading the set of existing terms Q corresponding to a specific row m of the parity matrix H , next calculating a set of terms Rmn therefrom, and then writing the set of terms Rmn back to the same memory positions that held the Qm,, before.
Symbol node update or vertical step
For each symbol node n = 0,...,N-l and for each meF(n), with other words for the non-zero positions of columns of the partity check matrix, compute
Figure imgf000007_0001
This computation goes vertically through the matrix. In the known TPMP decoding method, the same memory is used to hold the Rmn and the Qmn terms, so that Rmn is overwritten with the new
Qmn for the next iteration step. The calculation amounts to first reading the set of existing terms R1n corresponding to one specific column n of the parity check matrix H , next calculating a set of terms Qmn therefrom, and then writing the set of terms Qmn back to the same memory positions that held the R m,« before.
In the known TPMP decoding, first the Horizontal step is performed for all rows of the parity check matrix, and after that the Vertical step is performed for all columns of the matrix. The memory requirement in this case is proportional to the number of non-zero entries in the parity check matrix, hence if we denote as wc the column weight of the j -th column of the parity check matrix, the overall memory consumption can be calculated as
Figure imgf000007_0002
with wH the weight of the parity check matrix H . For the example of DVB-S2, the memory requirement using this conventional decoding algorithm corresponds to 792*360 = 285120 LLRs.
Assuming we have a regular structured code, i.e. a code with constant column and row weight wc and wr , the number of memory read and write accesses during the check node updates are wrM + N and wrM , respectively. During the symbol node updates, again, the number of memory read and write accesses is wrM + N and wrM , respectively, so the total number of read and write accesses is 2(wrM + N) and 2wrM , respectively.
The idea of the invention is to interpret each check equation in the parity matrix and associated likelihood matrix independently as a turbo code and to apply directly the Turbo Decoding principle. With other words, in a decoding method according to the invention, which employs, in iteration, likelihood values in a memory associated to nonzero elements of the parity check matrix of an LDPC code, and where a check node update step generating check node updated likelihood values is performed for consecutive rows of the parity check matrix, before performing the check node update step for a row of the parity check matrix, symbol node update steps are performed in columns where the row of the parity check matrix has set elements, and in these symbol node update steps the check node updated likelihood values are used from the memory associated with matrix rows above the current row whereas likelihood values from a previous iteration are used from the memory associated with the matrix rows below the current row. This has the advantage of a faster convergence.
Advantageously, the symbol node update steps are performed immediately before consecutive ones of the check node update steps; this maximises the convergence speedup. DECODING PRINCIPLE A
We can now define a Decoding Principle A, where after each decoding or evaluating step, the extrinsic information Lei is accumulated, and the new resulting extrinsic information is then saved to the next extrinsic memory position. This overwrites the information stored there, but this is allowable and possible because that information is not needed any more in the next decoding or evaluating step nor in any of the other subsequent decoding or evaluating steps. As a consequence, the memory requirement is one less than the column weight for each column of the likelihood matrix.
With other words, in a decoding method according to Decoding
Principle A, the check node updated likelihood values from any one of the check node update steps are stored using those memory positions that are associated with set elements of the respective next or subsequent row of the parity check matrix.
DECODING PRINCIPLE B
Decoding Principle A described above constitutes a way to reduce the overall memory consumption employing what can loosely be called a "turbo decoding principle". It is applicable where the sub-matrices of the parity matrix of a structured code all are permutation matrices. However, the parity matrix of the DVB-S2 standard has some sub-matrices which are sums of two permutation matrices. Categorically using the above described Turbo Decoding principle results in problems on those check equation groups where a sub-matrix consists of two or more permutation matrices.
Let us consider a section of the parity matrix and the associated likelihood matrix H, where the sub-matrices have dimension S X S ≡ 4X4 or for short S=4. )
Figure imgf000010_0001
Not the full matrix H is shown here. In this, BNG1 is the i-th Bit Node Group of variables or information digits μ,s,--^(,+i)S-iJ and CNG1 is the i-th Check Node Group of check digits \clS,..C0+1^-1J. Each of the sub-matrices joining or relating a Check Node Group to a Bit Node Group consists of one or more cyclic left shifted identity matrices V , where y identifies the number of shifts. L1J is the extrinsic information, due to the memory allocation described above, / can be seen as the address and j as the j -th LLR value at that address. In the example shown, the position of the non-zero entries of the sub-matrix for BNG0 and CNG0 is defined by two permutation matrices, whereas all other shown sub-matrices of H are each defined by a single permutation matrix of individual shift. The idea of Decoding Principle A above was to interpret each equation as one independent turbo code. That means, for updating one equation the intrinsic information for that code is obtained by adding together the channel LLR values and the extrinsic information of the other "decoders". The result of one decoding step is saved on the location of the next unused information set, this reduces the overall memory size. The buffer for BNG0 looks like
Figure imgf000011_0002
and t e depth of the buffer corresponds to the column weight minus one, as mentioned above.
In order to decode, the following computation has to be done, where Lx denote the input LLR values of the check equations: For C0, the following must be accumulated:
Figure imgf000011_0003
For C1 , the fol lowing must be accumulated :
Figure imgf000011_0001
Performing these computations according to Eqq. (B3) and (B4) in parallel results in two difficulties:
First: Accumulating takes place over different words, see L10 and L03 in Eq. (B3) . This is not really a difficulty and can be solved. Second: The result from C0 is directly used or needed in the accumulation step of C1.
In order to cope with the second difficulty, according to Decoding Principle B the decoding algorithm is extended. The extension consists in not using the output of C0 in the other decoder, C1. That means that instead of computing Eqq. (B3) and (B4), one computes
Figure imgf000011_0004
Hence the first input values, namely check equation C0 and C1, are the same. Likely in the first iteration, where all check nodes are being fed with the same input values Lc . By using this approach the data flow of extrinsic information from C0 to C1 is avoided, and the extrinsic information from C0 flows to C1 via C4 and C9 and so on. This amounts to appropriately mixing the decoding approach known as "Two Phase Message Passing" or "TPMP" with the Decoding Principle A, also called "Turbo Decode Message Passing" or "TDMP" in the following.
For this, according to Decoding Principle B, an additional buffer is introduced for those locations where one sub-matrix is identified via two or more permutation matrices. With that additional buffer it is possible to use the normal two phase message passing approach on those locations. With this extension, we have a bit accurate implementation of the Tanner graph; while the performance is still increased compared against the known two phase message passing method. The idea here is to mix the turbo decoding principle with the two phase message passing principle. Submatrices which are defined via one permutation matrix are decoded via the described Turbo Decoding principle and submatrices which are defined via two or more permutation matrices are decoded using the known Two Phase Message Passing. Of course, overall memory size is slightly increased according to the number of sub-matrices identified via two or more permutation matrices.
With other words in a decoding method according to Decoding Principle B, when applied on a structured code the parity check matrix of which has submatrices consisting of one or more permutation matrices, for those sub-matrices consisting of two or more permutation matrices, an additional buffer of a size equalling the number of permutation matrices minus one is provided, and the intrinsic information is computed by the second step of the two phase message passing, using the additional buffer of the sub-matrix.
The general idea of method B is applicable to any LDPC decoding where at least some of the submatrices are defined by more than one permutation matrix.
DECODING PRINCIPLE C
The decoding architecture of Decoding Principle B can be seen as being less suitable for applications requiring a high data throughput. The following therefore desribes another extension into a Decoding Principle C that is especially suited for high throughput applications. A high number of memory read accesses can be seen as a disadvantage of Decoding Principle B, so its architecture is changed in order to make it applicable for high throughput applications. The idea of Decoding Principle C is to reduce the high number of read accesses that occur when mixing the so-called "TPMP" and "TDMP". In order to avoid recomputing the intrinsic or input information of the next check node group, an accumulator is used. It is initialized with the associated channel information. Before overwriting the buffer with the decoding results the decoder result is added to the accumulator and the buffer contents of the next check node group is subtracted. After that the decoding result is saved on the buffer location of the next check node group.
From a mathematical point of view, Decoding Principle C results in the same performance and same computation as Decoding Principle B, but the buffer handling and timing scheme is different. Therefore it requires more memory, albeit less than conventional two phase message passing. The main advantage is the fast convergence and the high throughput .
With other words, in a decoding method according to Decoding Principle C, an accumulator is used, the accumulator is initialized with the associated channel information, the check node updated likelihood values are added to the accumulator and the contents at the memory positions associated with set elements of the subsequent row of the parity check matrix is subtracted from the accumulator, and then the check node updated likelihood values are stored to the memory positions associated with set elements of the subsequent row of the parity check matrix .
DECODING PRINCIPLE D
The idea of Decoding Principle D is to apply the turbo decoding principle to single steps of the known "Two Phase Message Passing" or "TPMP" decoding approach without a reduction of memory. Decoding Principle C, above, is applicable for high speed applications and achieves some memory reduction compared to a straightforward TPMP implementation. But it can be seen to have the disadvantage of being applicable only for structured LDPC codes where some submatrices of the Parity Check Matrix are defined via one permutation matrix and others via more than one permutation matrix.
In Decoding Principle D it has been found that the idea of the turbo decoding principle can, on its own, be advantageously applied within the Two Phase Message Passing (TPMP) approach. The basic idea is to transfer extrinsic information from one equation to the next equation. Decoding Principle D does not aim at memory reduction, but it can advantageously be applied to every LDPC code, whereas Decoding Principle C is only applicable for structured codes where sub-matrices consist of one or more permutation matrices.
From a mathematical point of view Decoding Principle D realises the same computation as Decoding Principles B and C. For each bit an accumulator is used for which a buffer is necessary. The main advantage is, that is applicable for every LDPC code, but it requires more hardware and memory than B and C.
In order to avoid recomputing the intrinsic or input information to be used in the calculations of the next check node group, accumulators together with accumulator buffers are used. With other words, a method for decoding structured LDPC coded data according to Decoding Principle D performs a decoding subiteration for each of the check node groups, computes for each Check Node Group new extrinsic informations using the accumulator values and information from the edge memory, computes the intrinsic information of each Check Node Group for each sub-matrix by subtracting old extrinsic information associated with the permutation matrices constituting the sub-matrix from the accumulator, and computes an a-posteriori information by adding the new extrinsic information to the accumulator .
DECODING PRINCIPLE E
Applying a "turbo decoding principle" on the entire parity matrix calculation leads to highly complex timing schemes, as seen in the previous Decoding Principles. Often it is not necessary to have that high throughput and convergence speed, but to have something in the middle. Decoding Principle E applies the turbo decoding principle to the TPMP basis in a different way and without increasing the complexity of the system. Even so, it increases the convergence speed compared to TPMP and decreases the overall memory consumption.
Decoding Principle E is based on the fact that the parity check matrix H of the LDPC code of DVB-S2 can be described as a juxtaposition of three partial matrices: H=A(X)A(2)T (El) The (/) in A(j) is an index and not a "power-of" operation. The sizes of all three matrices depend on the code configuration of the LDPC code. The row weight wr of H is constant in one code configuration and depends on the code configuration (number of ones in column) . The column weight of Λ(1) is wc and also depends on the code configuration. The column weight of Λ(2) always is wc = 3 and is constant in all code configurations. The important fact to build upon is that the column weight of T invariably is wc = 2 , regardless of the code configuration. The matrix T is referred to as the Triangular part of H .
According to Decoding Principle E, the turbo decoding principle as described above, i.e. performing symbol node update steps immediately before performing the check node update step, is only applied on T , i.e. on those columns that belong to T. As mentioned, the column degree of T is 2.
§ Conventionally or TPMP decoding T would imply that first all check equations are decoded and the resulting extrinsic information Le is saved in the edge memory. In the subsequent accumulation, for each column the a-priori information and all extrinsic information are added together. The resulting new intrinsic information to be used in the next iteration is computed and saved in the edge memory. This would imply that one edge buffer is needed for each Le, so that at the given column weight 2 of
T , two buffers per column are needed.
§ With Decoding Principle E, only one buffer is needed for each column of T . The intrinsic information in one column is computed anew for each check equation. This is done by accumulating the a-priori information and the extrinsic information in each decoding step inside a column of T; the new resulting extrinsic information replaces the old extrinsic information. In that way the new resulting extrinsic information is directly used in the next check equation, which results in a faster convergence.
In summary, Decoding Principle E has two effects:
§ It needs less memory: For DVB-S2, a conventional decoder needs to store all 792 edges, whereas with E, only 640 edges need to be stored. § It has a faster convergence, which leads to a higher throughput . § It needs less computation time per iteration, which also contributes to the higher throughput. § It has a much easier timing scheme.
Let us consider the following parity matrix:
Figure imgf000017_0001
Concep ually, using this parity check matrix for decoding can be visualised with a Log-Likelihood matrix or LLR matrix where one LLR value is associated to every set element of the party check matrix. For convenience, the LLR matrix is also called Η here and in the following. It is clear from the respective context, whether Η denotes the parity check matrix proper or the LLR matrix .
Figure imgf000017_0002
In this, the first column group SNG0 represents the matrix part A(X) , the second column group SNG1 represents the matrix part A(2) , and the following column groups SNG1J = 2,3,4 represent the matrix part T . Note that this example has a different dimension than the code of DVB-S2. The submatrix denoted as 71* is a special structure, namely
Figure imgf000018_0001
i.e. a non-cyclically left shifted identity matrix.
With other words, a method for decoding structured LDPC coded data according to Decoding Principle E performs the symbol node update steps immediately before consecutive ones of the check node update steps only in contiguous columns of the parity check matrix that have column weight 2, uses a single buffer for each of the contiguous columns, computes the intrinsic information anew for each row of the parity check matrix by column-wise accumulating the a-priori information and the extrinsic information, and by writing the new resulting extrinsic information over the old extrinsic information, and uses the new resulting extrinsic information in the next check node update step .
DECODING PRINCIPLE F
According to Decoding Principle F, a principle similar to turbo decoding is applied on nodes wherever possible, and a principle similar to two phase message passing is applied on nodes in those submatrices that are defined via two or more permutation matrices. The way this is done results in a straightforward timing scheme. It also allows a portioning of the computation cores and therefore simplifies the routing inside an integrated circuit to realize the Decoding Principle.
In summary, Decoding Principle F has at least the following advantages :
- same memory consumption as conventional decoders (For DVB-S2 storing 792 groups of 360 edges each in a FIFO);
- allows to partition the computation units and edge memories;
- the check processing units can be located at the memories;
- faster convergence => higher throughput;
- less computation time per iteration => higher throughput;
- easy timing scheme.
Assuming the following parity check matrix and associated LLR matrix :
H = (Fi;
Figure imgf000019_0001
NB: Here, Eq. (Fl) assumes the permutation matrix I1 to be defined as a cyclic right shift. NB2 : Eq. (Fl) and following, illustrating the case of binary symbols, use "Bit Node Group" (BNG) as a synonym for "Symbol Node Group" (SNG) . Again, the LLR values are stored in a so-called "edge memory" where one memory word wide enough to store S =4 LLR values is provided and associated to each permutation matrix that is a constituent of the LLR matrix. But according to Decoding Principle F they are stored in a different order. For the above example, the edge memory layout results as:
Figure imgf000020_0001
With other words, as before, one memory place exists for each entry in the LLR matrix, and the overall memory consumption is 7*S LLR values. But, specifically for Decoding Principle F, with respect to the order in which the LLR values are stored in the cells of the edge memory,
§ the first element at each address of the edge memory corresponds to the first sub equation of the associated bit node group, and
§ the LLR values corresponding to the parity check matrix are ordered row wise, i.e. the i-th element at any address contains the LLR value in the i-th row of the permutation matrix associated to the address.
Decoding Principle F amounts to mixing TPMP and TDMP using a- posteriori summation.
The Decoding Principle F is applicable for all LDPC codes, even for those where some submatrices of the parity check matrix are defined by more than one permutation matrix.
With other words, a method for decoding structured LDPC coded data according to Decoding Principle F uses an edge memory organised in words associated to the permutation matrices constituting the submatrices of the parity check matrix, each edge memory word comprising likelihood elements to store one likelihood value for every bit node of a bit node group. In the first likelihood element of each edge memory word, the likelihood corresponding to the first sub equation of the associated bit node group is stored, in the i-th likelihood element, the likelihood associated to the i-th row of the permutation matrix associated to the edge memory word is stored, and during computing the intrinsic information, a cyclic shift of likelihood values is performed at the accumulator.
The following Figures, together with the subsequent description, help to understand the invention.
Fig. 1 shows an example decoder having 3 check equations;
Fig . 2 shows the structure of the parity matrix of DVB-S2;
Fig . 3 shows an architecture for Turbo decoding according to the invention for the case of unstructured code;
Fig . 4 shows a Tanner graph of a structured LDPC code;
Fig . 5 shows decoding according to the invention for the case of structured code;
Fig . 6 shows a principle diagram of computing the extrinsic information;
Fig . 7 shows the buffer contents at successive iterations and subiterations;
Fig . 8 shows the buffer contents for Decoding Principe C at the times of iteration and sub iteration;
Fig . 9 shows the accumulator values in the iteration and sub- iteration steps;
Fig . 10 shows the final decision values at the end of the iteration;
Fig . 11 shows the block diagram of an LDPC decoder hardware according to Decoding Principle E;
Fig . 12 shows an example of a memory initialization; Fig. 13 shows a suggested hardware architecture for Decoding
Principle F; Fig. 14 shows accumulator memory content for an example of
Decoding Principle F; Fig. 15 shows edge memory content for an example of Decoding
Principle F; Fig. 16 shows inputs and outputs of key elements of the
Decoding Principle F hardware;
Conventional decoding algorithm for LDPC codes is described in [1] Robert G. Gallager, "Low-Density Parity Check Codes", 1963
[2] D. MacKay, R. Neal, "Good Codes Based On Very Sparse Matrices", Cryptography and Coding, 5th IMA Conf. , Lecture Notes in Computer Science, 1995
DECODING PRINCIPLE A
Fig. 1 shows 3 check equations corresponding to 3 rows of a parity check matrix. All of Lc, Li, LeO, LeI and Le2 are sets of LLR values. In conventional decoding, first all check equations are decoded or evaluated for instance using dedicated Check Processing Units also denoted as CPUs. The resulting extrinsic information Le (i) is saved in a so-called edge memory. This corresponds to "step I" as described above. Afterwards the accumulation is performed and the new intrinsic informations Li are updated and saved again in the edge memory, corresponding to step II. Hence an edge buffer is needed for sets LeO, LeI and Le2, corresponding to a total of three buffers.
In the decoding method and apparatus according to Decoding Principle A, only two edge buffers are needed. Separately, after each decoding or evaluating step, the extrinsic information Lei is accumulated, and the new resulting extrinsic information is then saved to the next extrinsic memory position. For the example of decoding check equation 0, channel information Lc and extrinsic informations LeI from check equation 1 and Le2 from check equation 2 are accumulated. After decoding or evaluating Check Equation 0, the resulting extrinsic information LeO is saved to the position of LeI. Of course, this overwrites the previous decoding information of LeI, but this is allowable and possible because that previous decoding information of LeI is not needed any more in the next decoding or evaluating procedure of check equation 1, nor in any of the other subsequent decoding or evaluating procedures.
Correspondingly, in the next decoding or evaluating procedure of check equation 1, the resulting extrinsic information LeI is saved to the position of Le2, and so on.
Advantageously, this results in less memory and in fewer write accesses to the buffers. As a further advantage, due to the fact that the decoding result of previous check equations are used immediately, i.e. are used already in evaluating the next check equation, the algorithm converges faster than conventional decoding, which also contributes to decreasing the number of buffer write accesses.
The decoding according to Decoding Principle A of the invention
The decoding is best explained with an example:
For the sake of illustration, we assume here an irregular and unrealistic code with column weights wc =4,wc =3, row weights wr =3, wr =4, wr =5, wr =4, and K = I, M= 4, N = 5. The associated parity check matrix is:
1 1 1 0 0 1 1 0 1 1
H = (4) 1 1 1 1 1 1 0 1 1 1 • St orage
For each column j we have one memory buffer P the size of which corresponds to the column weight minus one. Hence N = 5 memory buffers of size {3,2,2,2,2} denoted by P0,P1,P2,...,P4 are needed for the example of Eq. (4), their content is initialized to zero. In the following, P' shall denote the i -th element of P} . To the N = 5 memory buffers, a set of N = 5 counters ctr} is associated. Each of these counters points to the head of its associated buffer P} and is initialized with ctr := 0. Individual counters are needed because of the individually different size of the buffers. Even if after initialization the counters all start at 0, since each of them points to a position that cyclically traverses all positions of the associated buffer, they will loose sychronism very soon, and regain it only occasionally. In total, we have the following memory locations or cells: P0 ,P0 ,P0 ,P1 ,P1 ,P2 ,P2 ,P3 ,P3 ,P4 ,P4 , which is a total of 11 locations or cells. This shows the memory reduction, keeping in mind that the parity matrix has a total of 16 non-zero entries.
• At each decoding iteration k , as many decoding sub- iterations are carried out as there are check equations in case of decoding an unstructured code, or as there are check equation groups (check node groups) in case a structured code is being decoded. o For each decoding sub-iteration t, t = 0,.M -1 , the extrinsic informations Qn,n = 0,...,4 are computed using the channel information L1 and the extrinsic information saved in P] :
Figure imgf000025_0001
The calculated Qn are saved in Pn ctr and the pointer is advanced: ctrn=(ctrn +l)mod[wc — 2).
§ Example iteration k = 0
• Subiteration k = O,t = O (check equation 0 corresponding to the first line of the parity check matrix H ) :
Figure imgf000025_0002
Save Qn in Pn ctr and advance ctrn=(ctrn+l)mod(wc -2): rtr =1 rtr =1 rtr =1
Figure imgf000025_0007
• Subiteration fc = 0,? = l (check equation 1, 2nd line of parity matrix) :
Figure imgf000025_0003
Save Qn in Pn dr and advance ctrn=(ctrn+i)mod[wc —2):
Figure imgf000025_0004
^r0 =2,^ =0,rtr3 =l,rtr4 =1
• Subiteration
Figure imgf000025_0005
(check equation 2):
Subiteration (check equation 3)
Figure imgf000025_0006
• Repeat the last step for the next iteration step k = k +l.
Also, perform interleaving, if a structured code is used. • Final Iteration o Compute the a-posteriori information Qn,n = 0,...,4 :
Figure imgf000026_0002
Figure imgf000026_0003
Make hard decisions Xn = sgn(Qn ),n = 0,...,4
Memory requirement of the decoding according to the invention
The memory requirement depends on the number of ones or non-zero entries per column of the parity check matrix H . If, in the parity matrix, we denote as wc the column weight of the j -th column, the overall memory consumption can be calculated as
Figure imgf000026_0001
For the example of DVB-S2, the memory requirement using the decoding according to the invention corresponds to 612*360=220320 LLRs. The memory reduction against a conventional decoding is N LLR words, here JV = 180.
Memory bandwidth or number of write accesses
Assuming we have a regular structured code with column and row weights wc and wr , the number of memory read accesses per subiteration is wcwr . The number of write accesses per subiteration iswr . Since each iteration consists of M subiterations, the total number of read and write accesses are wcwrM and wrM , respectively. This compares to the total number of memory read and write accesses in the conventional decoding which is 2(wrM + N) and 2wrM , respectively.
For the example of DVB-S2, the code is structured and has q*lSO submatrices, where q represents the number of row groups and there are 180 column groups. Each submatrix is processed in parallel. The width of one word corresponds to the number of ones in the permutation matrices and is in this case 5=360.
Figure 2 shows the structure of the parity matrix of DVB-S2.
According to the invention, we have the following memory accesses:
Read :
Figure imgf000027_0001
Write : 11- 72 = 792
Overall : 792 + 3486 = 4278
With the conventional decoding, we have the following memory accesses :
Read: 2(ll-72+ 18θ) = 1944
Write: 2-11-72 = 1584
Overall: 1584+1944 = 3528
One can see, the reduction in write accesses is approximately 50 percent . Fig 3 shows a Hardware architecture of decoding according to the invention for the case of unstructured code.
The memory holds the channel LLR values and the extrinsic 5 information. The edge configuration memory holds the parity check matrix configuration, it constitutes or realises the sparse parity check matrix into a compact memory. The accumulator is simply an adder, which accumulates extrinsic information and can be reset by the controller. The controller
10 controls the process flow of all units. The hard decision unit performs the hard decision according to the sign of the LLR value. The CPU (check processing unit) unit computes the extrinsic information; it can be implemented using tanh and its approximation or using BCJR [3] /MAP/log-MAP based on trellis
15 structure.
[3] L. R. Bahl, J.Cocke, F.Jelinek, J.Raviv, "Optimal decoding of linear codes for minimizing symbol error rate", IEEE Trans. Information Theory, vol. IT-20, pp. 284-287, Mar. 1974.
20
• Storage
For each column we have one memory buffer, the size of the memory buffer corresponds to the column weight minus one. 25 Hence N memory buffers are denoted as P0,P1,P2,...,PN-1 and are initialized to zero. P1 denotes the /-th element of P1. A set of N counters that point to the head of these buffers is initialised as c?r;:=0. Hence we have the following memory locations : oO0 u p1O0 Ip1O1 '"-P1O^o"2'p10 'p11 '"-p1Iwq-2' 'P-1Af-I0 'P-1JV-I1 '"PiV-1WcNl~2
• Iteration Step o Compute sub-iteration t = 0,...,M -1 , one sub-iteration of each equation: § Initialize the accumulator to zero
§ Compute the extrinsic informations Qn,n = 0,...,4 using the channel information L1 and the extrinsic information stored in memory P1 :
Figure imgf000029_0001
§ Save Qn in Pn ctr and advance ctrn = (ctrn + l)mod(wc -l) :
Figure imgf000029_0003
§ Repeat that step according to the number of equations
• Final Iteration o Compute the a-posteriori informations Qn,n = 0,...,4
Figure imgf000029_0002
Make hard decisions
Figure imgf000029_0004
Hardware architecture of decoding according to the invention for the case of structured code
The longer a code is, the more efficient it is. Only long codes come close to the Shannon bound. The idea of a structured code is to design a code, in which several parity and variable nodes are updated simultaneously. The result is a partially parallel hardware. The resulting hardware is a good compromise between memory requirement and computational complexity. The idea behind that structure is to divide the variable nodes and check nodes into groups of sizeS. That means that the parity check equation is divided into submatrices; the submatrices are connected to other submatrices via edges. The submatrix itself defines the permutation TZ1 between the nodes.
Fig 4 depicts the Tanner graph of such code.
Such structured codes can also be expressed via a parity matrix
Figure imgf000030_0002
which nks the symbol node groups and the check node groups by permutation submatrices. Each check node group (CNG) and each symbol node group (SNG) consists of S = 4 check nodes and S = 4 symbol nodes, e.g. SNG0 consists of the bits C0,C1,C2,C3. Hence each submatrix of H is of size 4x4. With I1 we denote a permutation matrix, which is obtained by a number of i left cyclic shifts of an identity matrix of S = 4. It is also possible to define one entry of H via two permutation matrices like Il+J , which is a shorthand notation for I'+IJ, where i≠j . Example :
Figure imgf000030_0001
The principle now is to update S=A check nodes or symbol nodes simultaneously. Each memory word consists of S = 4 LLR values, which corresponds to one permutation matrix. Considering the check node update of group 0, we get
Figure imgf000031_0001
The meaning of these matrices is that in check equation C0 , LLR values of S3 have to be combined with s6; in C1, LLR values S0 have to be combined with S1, and so on. In hardware the LLR values of one permutation matrix is saved into one word, and the memory is organized as follows
Figure imgf000031_0002
Here, the bit width of one word is S *nb_bits_per_LLR . Assuming 6 bits per LLR value, the total bit width of one word in our example is 4-6 = 24 bits. For updating check node group 0, word0 has to be cyclically shifted right one symbol and has to be combined with wordγ, which is cyclically shifted right by two symbols. Now both words are combined using S = 4 check node update units, they are shifted back and saved in memory. The words are shifted, combined, shifted back and saved in memory.
This procedure can also be performed sequentially due to fact that the computation in CPU is associative.
Figure imgf000031_0003
The update procedure is performed for each check node group. Decoders for structured or architecture aware codes are easily obtained from those of normal decoding using additional interleavers and deinterleavers .
Fig 5 depicts an appropriate hardware for decoding structured code using the decoding according to the invention.
The decoding algorithm is:
• Storage
For each column we have one memory buffer, the size of the memory buffer corresponds to the column weight minus one. The word width corresponds to the submatrix size S . Hence the resulting bit-width of one word is S *nb_bits_per_LLR . Physically the memory buffers can be in one physical memory .
• At iteration k : Carry out decoding sub-iterations according to the number of check node groups. o Perform sub-iteration t : Compute for each check node group the extrinsic informations Qn using the channel information L1 and the extrinsic information saved in P] :
β )
Figure imgf000032_0001
Save Qn in Pn ctr and advance ctrn = (ctrn + i)mod[wc — 2) . § This computation is done in the following procedure:
• Initialize the CPU units to zero.
• For each entry in the set iGC(m), compute
§ Initialize the accumulator with L1 § Feed accumulator with all LLR values contained in buffer P o Perform interleaving by shifting the accumulator result according to the permutation sub matrix, o Feed CPU units and update • For each entry in the set ieC(m), compute o Perform deinterleaving by shifting the result of CPU units according to the permutation sub matrix, o Save the result and update (advance) the pointer
• Repeat what has been described for iteration k for the next iteration k=k+\, perform interleaving, because structured code is assumed here. • Final Iteration o Compute the a-posteriori informations Qn of group n :
Figure imgf000033_0001
• Make hard decisions of each bit of group xn=sgn(Qn)
DECODING PRINCIPLE B
In order to understand the Decoding Principle B, the following parity matrix is decoded exemplarily with the known "Two Phase Message Passing" or "TPMP" and with the Decoding Principle B or "TDMP", i.e. the mixture of the Decoding Principle A and TPMP. The example again uses a section of the parity matrix and the associated likelihood matrix H, where the sub-matrices have dimension S =4.
Figure imgf000034_0001
We again use F(n) and C(m) as defined above; the channel values of the i-th variable are denoted as Lc .
B.I Decoding with conventional two phase message passing
For each permutation matrix contributing to the sub-matrix pattern of Η, one memory word is foreseen in a so-called edge memory, the memory word consisting of S =4 Log-Liklihood-Ratio or LLR words. For the above parity matrix, the following memory configuration results:
Figure imgf000034_0002
Hence for each entry in the parity matrix one memory place Lmn exists, Ltj denoting the / -th LLR value of the extrinsic information stored at address / . The overall memory consumption in this example is 7*S LLR values.
B.1.1 Initialization : Each variable node «is assigned an a-posteriori LLR value. For every position (m,n) where Hmn≠0, or such that Hmn=l,
Figure imgf000035_0003
B.1.2 Phase (I), the so-called Horizontal Step:
For each check equation or matrix row cm , m = 0,...,M -1 , and for each neC(m), compute
Figure imgf000035_0001
where k denotes the time or iteration step.
In practice a subset of S =4 equations are computed at the same time, e.g. updating 4 equations in the first check node group at the same time. This can be achieved by reading the memory and cyclically shifting the memory words. Then feed the appropriate check node processing unit. After computation the result is shifted back and saved.
B.1.3 Phase (II), the so-called Vertical step:
For each symbol node n = 0,...,N—1 , and for each mGF{n), compute
Figure imgf000035_0002
where k denotes the time or iteration step. The computed new L^n then serve as the input of Phase (I) . For example the new input values for Phase (I) of the equation for C0 and Bit Node Group BNG0 is:
Figure imgf000035_0004
For C1 , we have
Figure imgf000036_0001
In practice, S =4 intrinsic informations are updated at the same time, but this time no shifting is necessary, see above memory configuration .
B.1.4 Decision :
For final decision, all a-posteriori probabilities are to be included in the final computation, hence also the variable node itself. For each n = O,...,N-1 , and for each meF(n), compute
Figure imgf000036_0002
The decision is made on the sign of Rn, i.e. Mn=I if Rn < 0. If uHT =0, then the algorithm is halted with u as the decoder output; otherwise processing is continued by starting another iteration at step or phase (I) .
B.2 Extended decoding "B", mixing two phase message passing and turbo decoding principle
For each column group we have one memory buffer, the depth of the memory buffer corresponds to the column weight minus one.
For the example this corresponds to N =2 memory buffers of size {3,3} denoted by P0^1 and initialized to zero. P; w denotes the z'-th element of P . The bitwidth or size corresponds to the dimension of the sub-matrix and bits per LLR value, in our case one memory words consists of S =4 LLR values. A set of N=2 counters that point to the head of these buffers are initialised as ctr := 0. For the first column group, we have the following memories :
Figure imgf000037_0002
For the second column group we have the following memories:
Figure imgf000037_0003
In this, t designates the sub-iteration, and k again designates the iteration step.
Exactly for those sub-matrices where we have two or more permutation matrices we introduce an additional buffer, buffer size corresponding to the number of permutation matrices minus one. Let B1 denote the buffer for the /-th sub-matrix having two or more permutation matrices and the depth of the buffer is one. In our example we have one additional buffer:
Figure imgf000037_0001
The overall memory consumption in this example is 6* S LLR values. Hence the memory reduction in that simple example is one memory word, but it can be very large, if less additional buffers are introduced.
Let here F(n) = {m : H(m,n)≠ O} be the set of check node groups that are connected to a bit node group JCn , n = 0,...,(N/S -1) and let C(m) = {n : H(m,n)≠ O} be the set of bit node groups that are connected to a check node group cm, m = 0,..,(M IS -1) . The size of each sub-matrix H(m,n) is SxS .
• At iteration k : Carry out 3 decoding sub-iterations corresponding to the number of check node groups
0 Perform sub-iteration t, t = 0,..,(M /S -1) . Compute for each check node group the extrinsic informations
Figure imgf000038_0003
usingJ the channel informations LCBNG, and the
extrinsic information saved in P; or B/ . For this we have first to compute the intrinsic or input information of each check node group for each sub- matrix, we denote this information as Lx . During computation of Lx it has to be distinguished if the extrinsic information of Lx belongs to a sub-matrix with one or more than one permutation matrices. Perform for each entry in the check node group the following computation.
§ Principle of turbo decoding:
If bit node group i from check node group m corresponds to a sub-matrix defined via one permutation matrix, Lx is computed as
Figure imgf000038_0001
§ Principle of two phase message passing:
If bit node group / from check node group m corresponds to a sub-matrix defined via more than one permutation matrix, Lx is computed by using the second step of the two phase message passing.
Now the additional buffer of the sub-matrix is used; let Bj denote the additional buffer in that sub-matrix. First compute the a-posterior probability
Figure imgf000038_0002
Now subtract the extrinsic information corresponding to that node to get the corresponding intrinsic or input information; for the first permutation matrix within a submatrix, compute
Figure imgf000039_0002
For the following permutation matrices within the submatrix compute
Figure imgf000039_0003
until all intrinsic or input information of the current check node group are computed.
o After the last computation, the intrinsic information of the check equations is obtained. Now the information is permuted according to the permutation matrices. This is accomplished by cyclically shifting the Lx values, the resulting values are denoted as L* .
Figure imgf000039_0004
Compute extrins ic informat ion according to equat ion (B21 ) :
Figure imgf000039_0001
For each check equation entry a new extrinsic information is obtained. o Shift back the new extrinsic information according to the permutation matrix
Figure imgf000039_0005
o Save the new extrinsic information into memory, distinguish if the resulting extrinsic information belongs to a sub-matrix with one or more permutation matrices .
§ One permutation matrix: Save Ln in Pn ctr and advance ctrn =(ctrn +l)mod[wc -l),
§ Two or more permutation matrices:
• Save all Ln in Pn tr and advance ctrn =(ctrn +l)mod[wc -l). Take care of the order. • Save the extrinsic information Ln for the first permutation in B} .
• Repeat last step for next iteration step k=k+l
• Final Iteration o The a-posteriori information is computed in the last iteration of one bit node group, e.g. in the last computation of one column, equation (B16) is replaced with the following equation (B23) :
Figure imgf000040_0004
This computation is done, if Ln is not required in other check equation group.
• Make hard decisions for every bit in bit group Xn = sgn[Lpost J
For the given example, we have: Updating the First Check Node Group 0 For computing the first check node group we have the following buffer contents:
Figure imgf000040_0001
The contents of the additional buffer is
Figure imgf000040_0002
Buffer contents of second column group:
with CMr1 =O
Figure imgf000040_0003
B . 2 . 1 . Computing intrinsi c or input informati on of CNG0 and
BNG0 :
First we compute the Lpost , by adding contents P and B :
Figure imgf000041_0003
Now we subtract the extrinsic information in order to get the input values for our equation. First permutation matrix:
Figure imgf000041_0004
These values corresponds to the intrinsic or input information on the location compare equation (BlO) in the two
Figure imgf000041_0006
phase message passing, it is exactly the same. In the next step we compute
Figure imgf000041_0005
B.2.2. Computing intrinsic or input information of CNG0 and BNG0 :
The intrinsic information from CNG0 and BNG1Is computed according to the turbo decoding principle
Figure imgf000041_0001
B.2.3. Now one can compute the new extrinsic information from the check equation
Figure imgf000041_0002
Fig. 6 shows a principle diagram of computing the extrinsic information, i.e. the principle of this concatenation. After computation of the tanh function we get the new extrinsic information. The values are shifted back according to the sub- matrix permutation offset and saved in memory. Example shifting first transposed column of above equation yields to our new extrinsic information of L01 (in this case no shifting is necessary) .
Figure imgf000043_0001
L0 is saved twice, namely in p^cntr<>) anc} j_n B0 (β) ; L1 is saved in n ((cntrr) +l)mod(wr — l)) π , π π ■ ,
P0 , and cntr0 i s incremented according to cntr0 = (cntr0 + 2)mod(wc - i) , hence after computat ion we have
Figure imgf000044_0004
The contents of the additional buffer is
Figure imgf000044_0001
The new extrinsic information of L4 is saved in P1"1*1 and cwtrj is advanced as
Figure imgf000044_0002
. Hence our buffer contents of the second column group is
Figure imgf000044_0003
Fig. 7 shows the buffer contents at successive iterations and subiterations. "P_0Λ0" stands for Po (o) , and so on. The shaded table cells with bold font inscription symbolize the pointer positions of the counters. These positions are overwritten in the next sub iteration step.
DECODING PRINCIPLE C
We call this Decoding Principle also "High Throughput Turbo Decoding Message Passing" or "HT-TDMP".
We again consider the example of Eq. (Bl) . Now, for each received or data bit an accumulator An is introduced. The accumulator value is saved in an accumulator buffer. The accumulator is initialized with the channel informations,
K =L , (C3) the memories P are of course initialized with zero. The accumulator group for channel information group LBNG of bit node group n is denoted as ABNG .
• At iteration k : Carry out 3 decoding sub-iterations corresponding to the number of check equation groups o Perform sub-iteration t, t = 0,..,M /S -1. Compute for each equation check node group the extrinsic informations Ln,ΠG Cym) using the channel informations L and the extrinsic information saved in P or B .
For this we have first to compute the intrinsic or input information of each check node group for each sub matrix, we denote this information as Lx . During computation of Lx it has to be distinguished if the extrinsic information of Lx belongs to a sub-matrix with one or with more than one permutation matrices.
Perform for each entry in the check node group the following computation. § Principle of turbo decoding:
If group / from check equation group m corresponds to a sub-matrix defined via one permutation matrix, Lx is directly obtained from the accumulator: Lx =ABNGι (C4)
§ Principle of two phase message passing: If group / from check equation group m corresponds to a sub-matrix defined via more than one permutation matrix, Lx is computed by using the second step of the two phase message passing; now the additional buffer of the sub-matrix is used; let Bj denote the additional buffer in that sub-matrix .
First compute the a-posterior probability by
Figure imgf000046_0002
Now subtract the extrinsic information corresponding to that node to get the corresponding intrinsic or input information; for the first permutation matrix compute
Figure imgf000046_0001
For the next permutation matrix compute
Figure imgf000046_0003
And for the third compute
Figure imgf000046_0004
And so on, until computations for all / permutation matrices in the sub-matrix of bit node group n are done .
Prepare the accumulator to compute the a- posterior probability; / is the number of permutation matrices in the sub-matrix of bit node group n , and wc is the column weight of current bit node group n . Compute
Figure imgf000046_0005
Note: j_s no-|- included in the accumulator,
Figure imgf000046_0006
hence it is not necessary to subtract.
Do this step until all intrinsic or input information of current check node group are computed. o After the last computation, the input information of the check equations is obtained. Now the information is permuted according to the permutation matrices.
This is accomplished by cyclically shifting the Lx values, the resulting values are denoted as L* .
Figure imgf000047_0002
o Compute extrins ic informat ion :
Figure imgf000047_0001
For each check equation entry a new extrinsic information is obtained.
o Shift back new extrinsic information according to the permutation matrix
Figure imgf000047_0003
o Compute the a-posterior value by adding the decoding result to the accumulator, for this we have to distinguish if the extrinsic information belongs to one or more permutation matrices. § In the case of one permutation:
Figure imgf000047_0004
§ In the case of more than one permutation, add the decoding results belonging to bit node group n :
Figure imgf000047_0005
o Now we have to prepare the accumulator to decode the next equation group, therefore we have to subtract the extrinsic value of the next pointer position. For this we have again to distinguish if we have one permutation or more
§ In the case of One Permutation, subtract the extrinsic information of the next check node group :
Figure imgf000048_0001
§ In the case of more than one (two) permutations, denoting by / the number of permutation matrices in the sub-matrix of the bit node group (Note: in DVB-S2, 1 = 2) :
Figure imgf000048_0002
Save now the new extrinsic information into memory, distinguishing if the resulting extrinsic information relates to a sub-matrix with one or with more than one permutation matrices.
§ In the case of One permutation matrix: Save Ln in Pn (ctr) and advance
Figure imgf000048_0004
§ In the case of Two or more permu ation matrices: • Save all Ln in Pn (ctr) and advance . Take care of the order.
Figure imgf000048_0003
• Final Iteration o The a-posteriori information is computed in the last iteration of one bit node group. It is simply obtained by omitting the computation of equation (C15) and (C16) . Or in other words, computation of equation (C15) and (C16) is not performed if accumulator group ABNG i s not anymore used .
• Make hard deci s ions for every bit in bit group xn = sgn\ABNG J For the given example, we have: Updating the First Check Node Group 0
For computing the first check node group we have the following buffer contents:
Figure imgf000049_0001
Buffer contents of second column group:
Figure imgf000049_0002
The accumulator values are
Figure imgf000049_0003
Or detai led :
Figure imgf000049_0004
Cl. Computing intrinsic or input information of CNG0 and BNG0
The sub-matrix joining CNG0 and BNG0 is defined via two permutation matrices, therefore we have to add the additional buffer contents to the accumulator group ABNG0 and have to compute it with two phase message approach.
First we compute the by adding contents according to
Figure imgf000049_0005
Figure imgf000049_0006
Eq. (C18) :
Figure imgf000050_0004
Now we subtract the extrinsic information in order to get the input values for our equation. For the first permutation matrix we get according to Eq. (C19) :
Figure imgf000050_0005
These values correspond to the intrinsic or input information on the location LOjO,L0,,L0j2,L03 , same as in the Two Phase Message Passing.
In the next step compute the next permutation:
Figure imgf000050_0006
Prepare Accumulator Buffer ABNG :
Figure imgf000050_0001
C . 2 . Computing intrinsi c or input informati on of CNG0 and
BNG1
The intrinsic information from CNG0 and BNG1Is computed according to the turbo decoding principle:
Figure imgf000050_0002
Figure imgf000050_0003
Accumulator Buffer ABNG doesn't have to be prepared because the bit node group is defined via one permutation.
C.3. Compute the new extrinsic information
The new extrinsic information can be computed as in B.2.3 above. After the step corresponding to Eq. (B24) we continue with:
C.4. Compute a-posterior probabilities by adding new extrinsic information to Accumulator
Figure imgf000051_0001
C.5. Prepare Accumulator For BNG0 we use:
Figure imgf000051_0002
For BNG1 : we use :
Figure imgf000051_0003
C . 6. Save new extrinsic values For the values of BNG0 , L0 is saved twice , namely in P0 cntri> and in 5 n0 (0) , L T1 is saved T i ■ n P n0 ((cBftn +l)mod(w -I)) and . cntr0 is incremented . as c«?r0 = (cntr0 + 2)mod(wc - l) , hence after computation we have
Figure imgf000052_0001
-» (crcfr, )
The new extrins ic informat ion of L4 i s saved in P1 ' and cntrγ i s advanced as cntrγ = (cntrγ + l)mod(wc — l) cntrγ = cntrγ mod^ — l) . Hence our buf fer content s of second column group i s
Figure imgf000052_0002
Fig. 8 shows the buffer contents for Decoding Principle C at the times of iteration and sub iteration. Same as in Fig. 7, the shaded boxes with the bold type inscriptions symbolize the pointer positions of the counters. These positions are overwritten in the next sub iteration step.
Fig. 9 shows the accumulator values in the iteration and sub- iteration steps. By omitting the computation of eqq. (C15) and (C16) of the intrinsic value for next check equation group we get the final decision values shown in Fig. 10.
Same as for Decoding Principle B, Decoding Principle C is applicable to decoding apparatus and method of any LDPC code where at least some of the submatrices are defined by more than one permutation matrix.
DECODING PRINCIPLE D
In the following description of the Decoding Principle D, each set of accumulator (an arithmetic-logic unit) and associated accumulator buffer (a memory cell) will be together called "accumulator" . We again consider the example described under Decoding Principle B above, Eq. (B26) .
For Decoding Principle D, same as for Decoding Principle C, an accumulator is introduced for each received or data bit, and the accumulator value is saved in an Accumulator Buffer An . The accumulator is initialized with the associated channel information :
K=K' (D3) and the edge memories P are initialized with zero.
In the following, the Log-Likelihood-Ratios in an n-th Bit Node Group BNGn are denoted as a Channel Information Group LBNG , for which the associated accumulators are grouped into a corresponding Accumulator Group denoted as ABNG . It should be mentioned, that the accumulator buffer can be the a- priori/channel buffer Lc . The accumulator contains always the a- posteriori information.
For the decoding according to Decoding Principle D:
• At iteration k : Carry out decoding sub-iterations corresponding to the number of Check Node Groups: o Perform sub-iterations t, t = 0,..,(M /S -1) . Compute for each Check Node Group the extrinsic informations Ln,tiG C(m) using the accumulator values ABNG and the extrinsic information saved in the edge memory. Additionally compute a new a-posteriori information of
BNGn. For this, first the intrinsic or input information denoted as Lx of each Check Node Group for each sub-matrix has to be computed. These values are obtained by subtracting the extrinsic information associated with the permutation matrix participating in the Check Node Group, from the accumulator:
Figure imgf000054_0003
Do this step until all intrinsic or input information of the current Check Node Group are computed.
In the following these L1' s are denoted as L.(oW)
o After the last of these computations, the input information of the check equations is obtained. Now the information is permuted according to the permutation matrices. This is accomplished by cyclically shifting the Lx values, the resulting values are denoted as L* .
L* = cyclic shift of Lx (D5 )
o Then compute the extrinsic information as
Figure imgf000054_0001
For each non-zero entry in the parity check matrix row, corresponding to a check equation, a new extrinsic information is obtained.
o Shift back the new extrinsic information according to the permutation matrix:
Ln = cyclic shift of Ln ( D 7 ) o Subtract the old extrinsic values from the accumulator :
§ In the case where the sub-matrix is defined by one permutation:
Figure imgf000054_0002
§ In the case where the sub-matrix is defined by more than one permutation, add the decoding results belonging to Bit Node Group n :
o Compute the a-posterior value by adding the decoding result to the accumulator. For this it must be distinguished if the extrinsic information belongs to one or to more permutation matrices. § In the case of one permutation matrix:
Figure imgf000055_0003
§ In the case of more than one permutation, add the decoding results belonging Bit Node Group n :
Figure imgf000055_0004
o Save back the new extrinsic information into the edge memory, where each Ln has its own memory location .
• Final Iteration o The a-posteriori information is available after each sub-iteration and corresponds to the accumulator buffer values ABNG .
• Make hard decisions for every bit in the Bit Node Group:
Figure imgf000055_0001
The Decoding Principle p_ is applicable to every LDPC code, even to unstructured ones, where it is simpler.
DECODING PRINCIPLE E The Decoding Principle E_ is described in the following by a comparison of conventional TPMP decoding in comparison to Principle E_ decoding.
E.I Conventional TPMP Decoding
For each permutation matrix, one edge memory word is foreseen, which in this example consists of S =4 LLR words. For the Parity Check matrix of Eq. (E8), the edge memory layout is:
Figure imgf000056_0001
One memory place for storing an LLR value is associated to each non-null entry in the Parity Check matrix, i.e. overall memory consumption is {13- S) LLR values.
E.1.1 Initialization
At time step Jc = O, each symbol node n is assigned an posteriori LLR value. For every position (m,n) in the Parity Check matrix, where Hmn=l, the associated LLR value in the associated LLR matrix is initialised:
Figure imgf000056_0002
After that, time step k is advanced by one.
E.1.2 Phase (I) or "Horizontal" step For each check equation cm , m = 0,...,M -1 , i.e. for each matrix row, and for each neC(m), compute
Figure imgf000057_0001
where k denotes the time step. More specifically, in the case of check equation C0, first, the set of existing terms ZJJj1' , corresponding to the row of C0 , is read, a set of terms L^n is calculated according to above Eq. (E13) , and then the set of terms L^n is written back to the same memory positions that held the L^~J) before. The computed new L^n are the input of Phase (ii) .
For the example, we get:
For the first parity check equation C0, we compute:
Figure imgf000057_0002
Because of the structured-ness of the code, subsets of S=A equations are computed at the same time, which amounts to updating 4 equations in the first check node group at the same time. This can be achieved by reading the memory and cyclically shifting the memory words, which are then fed to the appropriate check node processing units. After their computation, the results are shifted back and saved.
Updating the first check node group is done in the following way : Reading address 0 :
Figure imgf000057_0003
Permuting the LLR word L0 according to the permutation of sub matrix, in that case permutation matrix is /° . The new LLR word is called
Figure imgf000058_0002
Reading address 1 :
Figure imgf000058_0003
Permuting the LLR word L1 according to the permutation of sub matrix; in that case permutation matrix is 7"1 (cyclic shift) . The new LLR word is
Figure imgf000058_0004
Reading L4,L7 and L11; and permuting in the same way. According to the parity matrix, in the first check equation L00 has to be combined with L03, with L04 and with L08. As one can see, the first value of L0 and the first value of L1 are the correct values, which are to be combined. Now the whole vector can be computed at once. Hence all four equations, the whole check node group can be updated in parallel with
LV = L\ ® L\ ® r7 θ L'n = ((L03 ® L0 4 ® L0 81(L1 0 © L1 5 Φ L1 9 Φ L1 16 ),...,...)
Figure imgf000058_0001
Note: The fact that L019 doesn't exist in the parity check matrix has to be taken into account in the computation of the extrinsic information. The first column corresponds to the computation of above Eq. (E14) . Cyclically shift back the extrinsic values:
Figure imgf000059_0006
Figure imgf000059_0001
Figure imgf000059_0002
and save the values back to memory: at address 0 write
Figure imgf000059_0003
and at address 1 write:
Figure imgf000059_0004
Adresses 4, 7 and 11 are processed accordingly.
Increase time step k by one.
E.1.3 Phase (II) or "Vertical" step
For each symbol nodeπ = O,...,N-1 , and for each meF(n), compute
Figure imgf000059_0005
where ^denotes the time step. More specifically, in the case of column X0, first, the set of existing terms Lk~l m,n, corresponding to one specific column not the parity matrix H is read, then a set of terms Lk m,, , is calculated according to above Eq. (E25) , and then the set of terms Lk m,n, is written to the same memory positions that held the Lk~l m,n, before. E . g .
For example the new input values for Phase (I) of equation C0 and bit node group zero is
Figure imgf000060_0001
Figure imgf000060_0002
For C1 we have
Figure imgf000060_0003
In prac ce S =4 intrinsic informations are updated at the same time, but this time no shifting is necessary, see above memory configuration.
E.1.4 Decision
For final decision, all a-posteriori probabilities are to be included in the final computation, hence also the symbol node itself. For each n = 0,...,N—1 , and for each mGF{n), compute
Figure imgf000060_0004
The decision is made on the sign of Rn (i.e. Mn=I if Qn<0). If uHT =0, then halt the algorithm with u as the decoder output; otherwise go to step (I) and increase time step fcby one.
E .2 Principle E decoding, employing turbo decoding only on the triangular part of the parity check matrix For each permutation matrix, one memory word is provided. For the example considered here, a memory word consists of S =4 LLR words. Conceptually, the memory is divided into different regions. The edges of the parity matrix part corresponding to A1 and A2 are saved in addresses:
so that for each entry in A1 and A2 of the parity check matrix, one memory place exists.
The T matrix, however, is saved in
Figure imgf000061_0002
One can see, that only one memory place is necessary for each column of the T matrix. The overall memory consumption in this example using conventional decoding is (l0 Λ1) LLR values.
Further, we use F(n)= \m: Hmn =1/ and C(m)= \n : Hmn =1/ as defined above. Additionally, N i shall denote the number of symbol nodes in matrix part A1 , N^2 the number of symbol nodes contained in matrix part A2 , and NA =N At +NA2. In our example, iVA,=4 and
N AΔ2 =4.
E.2.1 Initialization Each symbol node n of parity matrix parts A1 and A2 is assigned an a-posteriori LLR value. With other words, for every position
(m,n) where Hmn=l and n≤NA-l, at time step k = 0,
Figure imgf000062_0003
For n> NA-\ , the corresponding memory addresses of the Tmatrix are initialized with zero:
Figure imgf000062_0001
Increase time step k by one.
For the example, this initializationyields the memory contents as :
Figure imgf000062_0002
E.2.2 Phase (I): Horizontal step
The following is carried out for each check equation, i.e. for each row of the parity check matrix. For each equation cm , m = 0,...,M -1 , and for each tiGC(m), we have to compute the new extrinsic value. Therefore we load the set of existing terms Lk~]l m,n , iiG C(m) for the specific row m from memory. In Decoding Principle E_, this set is a mixture of intrinsic and extrinsic information. For the matrix part where n≤NA—l, the information is intrinsic and is part of the matrix A1Or A2; for the part where n≥NA, the information is extrinsic and is part of the matrix part T . Therefore in case of extrinsic information we have to build the intrinsic information by adding the a-priori value :
Figure imgf000063_0001
Now we can compute the new extrinsic values by evaluating
Figure imgf000063_0002
The old extrinsic values are overwritten with the new extrinsic values. Note that the columns in matrix part T share one memory location.
For the example, and without parallelization, this amounts to: For the first check equation C0, compute:
Figure imgf000063_0003
In practice, a subset of S =4 equations are computed simultaneously, e.g. updating 4 equations in the first check node group at the same time. This can be achieved by reading the memory, cyclically shifting the memory words, feeding the appropriate check node processing units, and after their computation shifting back and saving the result.
With parallelization, this amounts to:
Updating the first check node group is done in the following way : Reading address 0 :
Lo=(L L1, L2^2 I33) (E38) All n of L1n n in L0 satisfy n < NA => Lmtτ m,n = Lm n , so that
Figure imgf000064_0001
Permute the LLR word Lmtro according to the permutation of the submatrix, in that case the permutation matrix is /° . The resulting LLR word is
Figure imgf000064_0002
Reading address 1 and 4 do exactly the same for L1""1 . The handling for address 7 and 11 is different. Read address 7:
Figure imgf000064_0003
Due to address sharing, the memory contains, at address 7, always the extrinsic value for the next check node update of that column.
All n of L1n n in L0 satisfy n ≥ NA => LmtI m,n = Lk~l m,n + Lc , so that
Figure imgf000064_0004
As one can see, Lmtr 4,8 directly corresponds to Eq. (E27) . Hence the vertical process of the matrix part T is included in the horizontal process. Permute the LLR word Lmtro according to the permutation of the submatrix, in this case the permutation matrix is /° . The resulting LLR word is
Figure imgf000064_0005
The same computation is done for Lmtrn . According to the parity check matrix, in the first check equation L00 has to be combined with L03, with L04 and with L08.
As one can see, the first value of L0 and the first value of L1 are the correct values, which are to be combined. Now the whole vector can be computed at once. Hence all four equations, i.e. the new extrinsic information of the whole check node group, can be updated in parallel with
Figure imgf000065_0003
Note, that the case that L019 doesn't exist in the parity check matrix has to be taken into account in the computation of the extrinsic information.
The first column corresponds to the computation of above Eq. (E33) . Cyclically shift back the extrinsic values
Figure imgf000065_0001
Figure imgf000065_0004
Figure imgf000065_0005
And save the values back to memory, at address 0 write
Figure imgf000065_0002
And at address 1:
Figure imgf000066_0002
Adress 4, 7 and 11 accordingly.
Note that, due to the fact that intrinsic values are directly computed again in the symbol node update process, the values
(L4 8 L5 9 L6 10 L1 n ) are not needed anymore . Therefore they are overwritten with the extrins ic values (L0 8 L1 9 L2 10 L3 11 ) of the next equation group. In that way the resulting new extrinsic information is directly used in the next computation, which is again an implementation of the "turbo decoding principle" of this invention, and leads to two effects, namely less memory and faster convergence.
E.2.3 Phase (II): Vertical step
For each symbol node n = 0,...,NA —1 , and for each mGF{n), compute
Figure imgf000066_0001
This computation is only needed for matrix part A1 and A2, it is not necessary for the triangular part T of the parity check matrix / associated likelihood matrix H. More specifically, in the case of column X0 , first, the set of existing terms Lk~l m,n, corresponding to one specific column n of the parity check matrix H is read, then a set of terms Lk m,, is calculated according to above Eq. (E50), and then the set of terms Lk m,n is written to the same memory positions that held the Lk~l m,n before. The computed new Lk m,n are then used as the input of another Phase (i) •
For our example, the new input values for Phase (I) of check equation C0 and bit node group zero are
Figure imgf000067_0002
For C1 , we have
Figure imgf000067_0001
In practice, 5=4 intrinsic informations are updated at the same time, but this time no shifting is necessary, see above memory configuration.
E.2.4 Decision
For the final decision, all a-posteriori probabilities are to be included in the final computation, hence also the symbol nodes themselves. For each n = 0,...,NA —1 , and for each me F\n) , compute
Figure imgf000067_0003
The decision is made on the sign of Rn, i.e. Mn=I if Rn<0. If uHT =0, which can be computed during Phase (I), then the algorithm is halted with u as the decoder output. Otherwise increase time step k by one and loop back to Phase (I) .
E .3 Decoding Principle E hardware
Figure 11 shows the block diagram of an LDPC decoder hardware according to Decoding Principle E. The data is saved in input memory "DEINT". The LDPC decoder is started via start signal "START Decoder", if one frame is completely obtained in the "DEINT" buffer. The iteration control starts the individual processes of the LDPC decoder. The processes consists of three phases, namely initialization, decoding and output phase. The initialization phase is done by the block "load_cont", the decoding is done by the "run_cont" and "bpu_cont" blocks, and the output phase is done by the block "bit_stream" . For all this, the iteration control block sets up the corresponding data path via multiplexer: MUXl, DEMUX3, MUX4, MUX7, MUX6.
E.3.1 Initialization phase
The iteration control block "iter_control" sets the multiplexer to MUXl=O ;MUX6=0,MUX7=0, thus enabling path 1. Now the block
"load_cont" has control of the input memory and the edge memory. The iteration control block sends a start pulse to the block "load_cont". The load_cont block reads the data contents of the input memory and writes the data to the edge memory. As known, the parity check matrix of the LDPC code can be split into
Figure imgf000068_0001
Where ^ depends on the code configuration and code rate and represents the number of check node groups. The matrix A can be further split into two matrices
Figure imgf000068_0002
Where i in A1 is an index and not a "power-of" operation. The edge memory is initialized in a column oriented way. That means, that for each column the corresponding channel LLR value is written wr times into successive memory locations in the edge memory. This is done for A1 and A2. A conventional LDPC decoder would also initialize the matrix T . However, in our Decoding Principle E_, the turbo decoding principle is used on these nodes, so that the memory necessary for that part of the matrix is reduced by a factor of 2 and is initialized with zero. This reduces the overall memory consumption of the LDPC decoder. In this approach the memory consumption necessary for the edges are 720x2160Bits, a conventional decoder needs 792x2160Bits . Figure 12 shows an example of a memory initialization as described, for a code of coderate 1/3 and q = 120. BNG in the figure means symbol or bit node group and denotes the actual column group of the parity matrix. CNG means check node group and denotes the row group of the parity matrix. NAl =bitgrp\ and
NAι =bitgrp2 denote the start of the next part of the parity check matrix. The even addresses correspond to the edge memory 1 and the odd addresses correspond to the edge memory 2. This code configuration requires 600 edges; a conventional decoder would require 600 memory locations. The new approach needs only 540 memory locations, without loosing any edges, because of exploiting that one memory location is used for two edges. After initialization, the block "load_cont" sends an acknowledge back, indicating to the iteration control block, that decoder is ready for decoding.
E .3.2 Decoding
As usual, the decoding process comprises a check node update and a bit node update.
E.3.2.1 Check Node Update, performed in "Check Processing Units"
The iteration control block "iter_control" sets the multiplexer to MUXl=I ; MUX4=0, MUX6=2 ; MUX7=2 ; DEMUX3=0. Now the block
"run_cont" has control of the input memory, edge memory, adder, shifter and cpu/bpu units. The iteration control block sends a start pulse to the block "run_cont". The block updates now the intrinsic information in the edge memory with the extrinsic information. This is done in sequential manner by reading and writing check node group by check node group. For the example, check node group 0 (CNG 0) comprises the addresses 0, 157, 480, 539. The block reads the word of address 0 (edge memory 1) feeds the value through the shifter to the check processing units (CPU) , together with a synchronization signal, indicating start of check node group. Then it feeds the check processing units with word of address 157 (edge memory 2) . The address of 480 is in the T part of the matrix where turbo decoding is applied. Therefore the channel value (a-priori) of bit node group 120 is read from input memory and added to the word of address 480 from the edge memory. The result is feed through the shifter to the CPU unit. The same is done for address 539. The new extrinsic information is saved back to the memory location 0, 157, 480, 539. According to the invention, the result of this computation is used in the following check node updates. For instance check node group 1 comprises the addresses 13, 158, 480, 481. Again, the address of 480 is in the T part of the matrix. The channel value of bit node group 120 is read from input memory and the previous written word of address 480 is read from memory and added together. That means that the result in the T part of the matrix for check node group 0 is used in the subsequent check node updates. As a consequence the algorithm converges faster than the conventional algorithms. To handle all the timing and groups, the block "run_cont" reads an internal ROM memory, containing information of edge configuration and timing information of memories, shifter and adder. Due to the partition of the memory it is possible to read one memory, while the other is being written and vice versa.
E.3.2.2 Bit/Symbol Node Update, performed in Bit Processing Units (BPU)
For bit node update the iteration control block "iter_control" sets the multiplexer to MUXl=I; MUX4=1, MUX6=1; MUX7=1; DEMUX3=1. Now the block "bpu_cont" has control of the input memory, edge memory and cpu/bpu units. The iteration control block sends a start pulse to the block "bpu_cont". The block updates now the extrinsic information in the edge memory with the intrinsic information. This is done in sequential manner by reading and writing bit/symbol node group by bit/symbol node group. For the example, bit/symbol node group 0 (BNG 0) comprises the addresses 0, 1, ..., 12. The block reads the word of address 0 (edge memory 1) feeds the value to the bit/symbol processing units (BPU), together with a synchronization signal, indicating start of bit node group. The BPU computes the a-posteriori and the intrinsic value. The a-posteriori value is used by the block "bit_stream" . The intrinsic values are saved on the same location where extrinsic value was located before, hence addresses 0, 1, ...,12.
This computation is done for matrix parts A1 and A2. Matrix part T is omitted.
E.3.3. Output phase, i.e. output of data (bit_stream)
During the BPU process the bit decisions are available at the output of the BPU units. If the block "iter_control" decides to stop the iteration, it sends a pulse on the signal "start output" before the last BPU is started. This causes the decisions during the last BPU process to be saved in the output buffer and sent to output.
DECODING PRINCIPLE F
According to Decoding Principle F, for each bit, i.e. for each column of the parity check matrix / LLR matrix, an accumulator is introduced. The accumulator value is saved in an accumulator buffer. The accumulator is initialized with the associated channel information:
A«=Lcn' (F3) and the edge memory is initialized with zero. The accumulator group for channel informations of the subvector LBNG of bit node group n is denoted as ABNG . The accumulator buffer can be the a- priori/channel buffer Lc . During the decoding the accumulators contain always the a-posteriori informations. At every iteration k , decoding sub-iterations corresponding to the number of check node groups are carried out:
• Perform sub-iteration Compute for each check
Figure imgf000072_0004
node group the extrinsic informations using
Figure imgf000072_0005
the accumulator values A^ and the extrinsic information saved in the edge memory. Additionally compute a new a- posteriori information of BNGn . For this we have first to compute the intrinsic or input information of each check node group for each submatrix. These values are obtained by subtracting the extrinsic information belonging to the permutation matrix participating on the check node group, from the accumulator. Specifically, according to this invention, the cyclic shift is done at the accumulator. o Read the Accumulator buffer and shift to perform the permutation :
Figure imgf000072_0001
according to the associated permutation matrix. o Compute the intrinsic information by subtracting the extrinsic information of the edge memory from the accumulator :
Figure imgf000072_0003
Do this step until all intrinsic or input informations of the current check node group are computed. In the following, these L^ ' s are denoted as L° . (In hardware it is recommended to store these values in a FIFO) . o Compute new extrinsic informations according to equation
Figure imgf000072_0002
For each check equation entry a new extrinsic information is obtained.
§ Update a new a-posteriori information
Figure imgf000073_0001
§ Shift back the a-posteriori information, according to the permutation matrix
Figure imgf000073_0002
§ Distinguish now between one or more permutation matrices in BNG1 : • In case of one permutation, calculate
Figure imgf000073_0003
• n case of more than one permutation in BNG1, add the decoding results belonging to the bit node group / . Perform a backward shift of the corresponding extrinsic information and update the a-posteriori sum:
Figure imgf000073_0004
§ Save the new a-posteriori information A^ and the new extrinsic information ι^ewextr # • Final Iteration o The a-posteriori information is available after each sub-iteration and corresponds to the accumulator buffer values ABNG . Due to the fact that a-posteriori sum is updated during each subiteration, the decoding process can be stopped at any time.
• Make hard decisions for every bit in bit group xn=sgn\ABNG J As one can see, the extrinsic informations are not shifted, the shifter is relocated to the accumulator buffer, which has several advantages for the hardware.
Figure 13 shows a suggested hardware architecture for Decoding Principle F. The accumulator memory contains the a-posteriori sum and is initialized with the channel LLR values according to Eq. (F3) . These values are updated at each subiteration . Therefore the accumulator values are loaded to the shifter and shifted according to the associated permutation matrices according to Eq. (F4) . The corresponding extrinsic information is loaded from a edge memory which can be implemented as a FIFO, where read underflows are treated with zero reading. The extrinsic information is subtracted from the shifted a- posteriori value according to Eq. (F5) . The resulting intrinsic value is fed to the CNU unit and to a demultiplexer. If the submatrix consists of one permutation matrix, the intrinsic value from demultiplexer is fed to the FIFO. If the submatrix consists of more than one permutation matrix, the first intrinsic value is fed to the FIFO and afterwards the FIFO is fed with the corresponding negative extrinsic values for that submatrix, this prepares the computation inside the brackets of Eq. (FlO) . Now the CNU computes the new extrinsic values according to Eq. (F6) . These values are saved in the edge memory. At the output of the CNU the new extrinsic value is added to the output of the FIFO and passed to the shifter according to Eq. (F7) . The output of the shifter is fed to an adder. Again it is distinguished between one or more permutation matrizes in one submatrix. If the submatrix is defined via one permutation matrix, the value output value of the shifter is directly loaded to the a-posteriori register (no adding) . If the submatrix is defined via more than one permutation matrix, the first value is loaded to the a-posterior register and the following extrinsic update values are added to that value, as in Eq. (FlO) . The old a-posterior sum in the accumulator memory is overwritten with the new a-posterior value in the a-posterior register .
As an example, let us assume again the parity check matrix and associated LLR matrix of Eq. (Fl) . Figures 14 and 15 show the associated accumulator memory content and edge memory content, respectively .
Figure 16 shows inputs and outputs of key elements of the Decoding Principle F hardware. As one can see, in order to update the sum for bit 0 one has to compute
Figure imgf000075_0001
where LhJ indicates the new computed extrinsic value. In Fig. 16 the step (table row) "write AO" corresponds to that correct computation.

Claims

Claims
1. A method for decoding LDPC coded data, the method employing in iteration likelihood values in a memory associated to nonzero elements of the parity check matrix of the LDPC code, and a check node update step generating check node updated likelihood values being performed for consecutive rows of the parity check matrix, the method characterized in that before performing the check node update step for a row of the parity check matrix, symbol node update steps are performed in columns where the row of the parity check matrix has set elements, in which symbol node update steps the check node updated likelihood values are used from the memory associated with matrix rows above the the current row and likelihood values from a previous iteration are used from the memory associated with the matrix rows below the current row.
2. The method of claim 1, wherein the symbol node update steps are performed immediately before consecutive ones of the check node update steps .
3. The method of claim 1 or 2, wherein the check node updated likelihood values from the check node update step are stored using memory positions associated with set elements of the subsequent row of the parity check matrix.
4. The method of claim 1 or 2, applied on a structured code the parity check matrix of which has submatrices consisting of one or more permutation matrices, wherein for those sub-matrices consisting of two or more permutation matrices, an additional buffer of a size equalling the number of permutation matrices minus one is provided, and the intrinsic information is computed by the second step of the two phase message passing, using the additional buffer of the sub-matrix.
5. The method of claim 3, wherein an accumulator is used, wherein the accumulator is initialized with the associated channel information, wherein the check node updated likelihood values are added to the accumulator and the contents at the memory positions associated with set elements of the subsequent row of the parity check matrix is subtracted from the accumulator, and wherein the check node updated likelihood values are then stored to the memory positions associated with set elements of the subsequent row of the parity check matrix.
6. A method for decoding structured LDPC coded data having check node groups and bit node groups, and employing one accumulator for each bit node and an edge memory, where a decoding subiteration is performed for each of the check node groups, the method characterised by steps of
- computing for each Check Node Group new extrinsic informations using the accumulator values and information from the edge memory; - computing the intrinsic information of each Check Node Group for each sub-matrix by subtracting old extrinsic information associated with the permutation matrices constituting the sub-matrix from the accumulator;
- computing a-posteriori information by adding the new extrinsic information to the accumulator.
7. The method of claim 2, wherein the symbol node update steps immediately before consecutive ones of the check node update steps are performed only in contiguous columns of the parity check matrix that have column weight 2, wherein a single buffer is used for each of the contiguous columns, wherein the intrinsic information is computed anew for each row of the parity check matrix by column-wise accumulating the a-priori information and the extrinsic information, and by writing the new resulting extrinsic information over the old extrinsic information, and wherein the new resulting extrinsic information is used in the next check node update step.
8. The method of claim 6, wherein the edge memory is organised in words associated to the permutation matrices constituting the submatrices of the parity check matrix, each edge memory word comprising likelihood elements to store one likelihood value for every bit node of a bit node group, the method additionally characterised by steps of: - storing in the first likelihood element of each edge memory word the likelihood corresponding to the first sub equation of the associated bit node group,
- storing in the i-th likelihood element of an edge memory word the likelihood associated to the i-th row of the permutation matrix associated to the edge memory word,
- in the step of computing the intrinsic information, perform a cyclic shift at the accumulator.
9. An apparatus for decoding LDPC coded data, characterized by being equipped and configured to execute the decoding method of one of claims 1 to 8.
10. The apparatus of claim 9, having accumulators and associated accumulator buffers associated to received or data bits.
PCT/EP2006/070101 2006-01-17 2006-12-21 Method and apparatus for error correction decoding WO2007082626A2 (en)

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
EP06100463 2006-01-17
EP06100463.6 2006-01-17
EP06100854 2006-01-25
EP06100854.6 2006-01-25
EP06100831 2006-01-25
EP06100831.4 2006-01-25
EP06100982 2006-01-27
EP06100982.5 2006-01-27
EP06125438.9 2006-12-05
EP06125438 2006-12-05

Publications (2)

Publication Number Publication Date
WO2007082626A2 true WO2007082626A2 (en) 2007-07-26
WO2007082626A3 WO2007082626A3 (en) 2008-06-05

Family

ID=38287972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2006/070101 WO2007082626A2 (en) 2006-01-17 2006-12-21 Method and apparatus for error correction decoding

Country Status (1)

Country Link
WO (1) WO2007082626A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645810B2 (en) 2011-07-31 2014-02-04 Sandisk Technologies Inc. Fast detection of convergence or divergence in iterative decoding
CN104092468A (en) * 2014-07-07 2014-10-08 西安电子科技大学 LDPC linear programming decoding method based on acceleration alternating direction multiplier method
US10735031B2 (en) 2018-09-20 2020-08-04 Western Digital Technologies, Inc. Content aware decoding method and system
US10862512B2 (en) 2018-09-20 2020-12-08 Western Digital Technologies, Inc. Data driven ICAD graph generation
US11528038B2 (en) 2020-11-06 2022-12-13 Western Digital Technologies, Inc. Content aware decoding using shared data statistics

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1610466A1 (en) * 2004-06-22 2005-12-28 Infineon Technologies AG LDPC decoder for decoding a low-density parity check (LDPC) codewords

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1610466A1 (en) * 2004-06-22 2005-12-28 Infineon Technologies AG LDPC decoder for decoding a low-density parity check (LDPC) codewords

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"DIGITAL VIDEO BROADCASTING (DVB); SECOND GENERATION FRAMING STRUCTURE, CHANNEL CODING AND MODULATION SYSTEMS FOR BROADCASTING, INTERACTIVE SERVICES, NEWS GATHERING AND OTHER BROADBAND SATELLITE APPLICATIONS" ETSI STANDARDS, EUROPEAN TELECOMMUNICATIONS STANDARDS INSTITUTE, SOPHIA-ANTIPO, FR, no. V111, June 2004 (2004-06), pages 1-74, XP002311764 ISSN: 0000-0001 *
ANDREW J BLANKSBY ET AL: "A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder" IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 37, no. 3, March 2002 (2002-03), XP011061705 ISSN: 0018-9200 *
KIENLE F ET AL: "A Synthesizable IP Core for DVB-S2 LDPC Code Decoding" DESIGN, AUTOMATION AND TEST IN EUROPE, 2005. PROCEEDINGS MUNICH, GERMANY 07-11 MARCH 2005, PISCATAWAY, NJ, USA,IEEE, 7 March 2005 (2005-03-07), pages 100-105, XP010780248 ISBN: 0-7695-2288-2 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645810B2 (en) 2011-07-31 2014-02-04 Sandisk Technologies Inc. Fast detection of convergence or divergence in iterative decoding
CN104092468A (en) * 2014-07-07 2014-10-08 西安电子科技大学 LDPC linear programming decoding method based on acceleration alternating direction multiplier method
CN104092468B (en) * 2014-07-07 2017-02-08 西安电子科技大学 LDPC linear programming decoding method based on acceleration alternating direction multiplier method
US10735031B2 (en) 2018-09-20 2020-08-04 Western Digital Technologies, Inc. Content aware decoding method and system
US10862512B2 (en) 2018-09-20 2020-12-08 Western Digital Technologies, Inc. Data driven ICAD graph generation
US11251814B2 (en) 2018-09-20 2022-02-15 Western Digital Technologies, Inc. Content aware decoding method and system
US11258465B2 (en) 2018-09-20 2022-02-22 Western Digital Technologies, Inc. Content aware decoding method and system
US11528038B2 (en) 2020-11-06 2022-12-13 Western Digital Technologies, Inc. Content aware decoding using shared data statistics

Also Published As

Publication number Publication date
WO2007082626A3 (en) 2008-06-05

Similar Documents

Publication Publication Date Title
US7343548B2 (en) Method and apparatus for encoding and decoding data
Wang et al. Low-complexity high-speed decoder design for quasi-cyclic LDPC codes
US10536169B2 (en) Encoder and decoder for LDPC code
US7774674B2 (en) LDPC decoder for DVB-S2 decoding
EP1779524A1 (en) Improvements to a layered decoding approach for low density parity check (ldpc) codes
CN104868925A (en) Encoding method, decoding method, encoding device and decoding device of structured LDPC codes
US20070033484A1 (en) System and method for designing RS-based LDPC code decoder
JP3898129B2 (en) High speed module, device and method for decoding concatenated codes
CN106936444B (en) Set decoding method and set decoder
US10833704B1 (en) Low-density parity check decoder using encoded no-operation instructions
US9104589B1 (en) Decoding vectors encoded with a linear block forward error correction code having a parity check matrix with multiple distinct pattern regions
CN101154948A (en) Methods and apparatus for low-density parity check decoding using hardware-sharing and serial sum-product architecture
CN113783576A (en) Method and apparatus for vertical layered decoding of quasi-cyclic low density parity check codes constructed from clusters of cyclic permutation matrices
WO2007082626A2 (en) Method and apparatus for error correction decoding
WO2008069231A1 (en) Decoding device and decoding method
Wang et al. Low complexity, high speed decoder architecture for quasi-cyclic LDPC codes
CN114499541A (en) Layered decoding method, device, terminal equipment and medium
US10727869B1 (en) Efficient method for packing low-density parity-check (LDPC) decode operations
US8627168B2 (en) Decoding device and decoding method
US11075650B1 (en) Sub-matrix reduction for quasi-cyclic LDPC codes
EP3496277A1 (en) Parallel encoding method and system for protograph-based ldpc codes with hierarchical lifting stages
Zhang et al. Low complexity DVB-S2 LDPC decoder
Usman et al. A multi-Gbps fully pipelined layered decoder for IEEE 802.11 n/ac/ax LDPC codes
GB2463252A (en) A message passing LDPC matrix decoder with parallel sub-decoders scheduled to avoid memory contention problems
KR101073636B1 (en) Decoder using Low Density Parity Check code

Legal Events

Date Code Title Description
NENP Non-entry into the national phase in:

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06841559

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 06841559

Country of ref document: EP

Kind code of ref document: A2