GB2463252A

GB2463252A - A message passing LDPC matrix decoder with parallel sub-decoders scheduled to avoid memory contention problems

Info

Publication number: GB2463252A
Application number: GB0816146A
Authority: GB
Inventors: Mohamed Rafiq Ismail; Imran Ahmed
Original assignee: Toshiba Research Europe Ltd
Current assignee: Toshiba Europe Ltd
Priority date: 2008-09-04
Filing date: 2008-09-04
Publication date: 2010-03-10
Anticipated expiration: 2028-09-04
Also published as: GB2463252B; GB0816146D0

Abstract

A decoder for decoding of a low-density parity check (LDPC) matrix using an increased number of sub-decoders, each operating on a separate row of the matrix and thereby increasing throughput. Time slots allocated to the decoding of individual non-zero entries in each row are scheduled so that no more than one non-zero entry of each column of the parity check matrix is decoded in each time slot according to a layered (message-passing) decoding algorithm (figs.4 and 7, not shown). Memory associated with each column of the parity check matrix is thus only accessed by one processor per time slot and memory conflict problems are therefore avoided. Also disclosed is switching to decoding of a smaller number of rows in parallel to reduce power consumption.

Description

A Decoder and a Method of Decoding

FIELD OF THE INVENTION

The present invention relates to a turbo decoding message passing (TDMP) decoder. In particular the present invention relates to a TDMP decoder comprising a decoder scheduler arranged to enable providing high throughput.

BACKGROUND OF THE INVENTION

Low Density Parity Check (LDPC) codes have shown error correcting performance approaching the capacity of associated transmission channels and thus are desirable in transmission systems requiring robust performance. In addition their comparatively simple decoding structure has led to them being adopted in high throughput systems.

"Low-Density Parity-Check Codes", (R. G. Gallager, Cambridge, MA: MIT Press, 1963) describes iterative Two-Phase Message Passing (TPMP) algorithms commonly used to decode LDPC codes. TPMP decoding involves an exchange of information between check nodes and variable nodes. An iteration consists of the set of check nodes being updated in one half of the iteration followed by an update of the variable nodes in the second half iteration.

Updating of each set of nodes may be done in a fully parallel fashion. However, such an approach is hampered by the need to access memory locations in a concurrent manner resulting in significant routing congestion when implemented in hardware.

"Field Programmable Gate Array Implementation of a Generalized Decoder for Structured Low-Density Parity Check Codes" (L. Sun, B.V.K.V.

Kumar, IEEE Int'l. Conf. On Field-Programmable Technology, 2004, pp. 17-24) proposes partially parallel TPMP decoder architectures where the nodes are grouped together before processing in a parallel fashion, thereby reducing the overall routing network.

"Mapping interleaving laws to parallel turbo and LDPC decoder architectures" (A. Tarable, S. Benedetto; IEEE Trans. On. Info. Theory; vol. 50; issue 9; pp. 2004-2009; Sept. 2004) presents an alternative scheme, again based on partitioning of check node and variable node processing. The check nodes are processed by a number of processors in one half of an iteration with a mapping algorithm ensuring memory access collisions do not occur. Variable nodes are similarly processed in the second half-iteration. The memory contention problem reduces to a problem of node colouring or edge colouring of a graph.

"Shuffled Iterative Decoding" (J. Zhang, M. P. C. Fossorier; IEEE Trans. Comms., vol. 53, no.2, pp. 209-213, Feb. 2005) addresses the problem of memory contention in the context of LDPC encoder design where it is again described as a graph colouring problem. TPMP decoding typically requires a large number of iterations and though intrinsically parallel in its operation its implementation requires a physical network of wires mirroring the topology of the code's Tanner graph.

Due to these bottlenecks alternative algorithms have been proposed as detailed in "Shuffled Iterative Decoding" (J. Zhang, M. P. C. Fossorier; IEEE Trans. Comms., vol. 53, no.2, pp. 209-213, Feb. 2005), "LDPC Code Construction with Flexible Hardware Implementation" (D. E. Hocevar, IEEE Int'l.

Conf. On Comms. (ICC), Anchorage, AK, pp. 2708-2712, May 2003), "A Reduced Complexity Decoder Architecture via Layered Decoding of LDPC Codes" (D. E. Hocevar, IEEE workshop on Signal Processing Systems (SIPS) 2004, pp. 107-112), US 2006/0123318 entitled "Method and Apparatus for Decoding Low Density Parity Check Code Using United Node Processing", and "A Turbo-Decoding Message-Passing Algorithm for Sparse Parity-Check Matrix Codes" (M. M. Mansour, IEEE Trans. Signal Processing, vol. 54, no. 11, Nov.

2006). A LDPC decoding method in which convergence is improved by updating of each bit node once during one iteration and updating of each check node each time a neighbouring bit node is updated is known from US 2008/0010579 entitled "Apparatus and Method for Receiving signal in a Communication System". Layered decoding techniques, as referred to by Hocevar above, are generally techniques that decode parity check matrices based on a received input signal on a row by row basis, wherein the decoding of a non-zero entry of a row of the parity check matrix is based on a previously generated decoding result of a non-zero entry located in a different row but in the same column of the parity check matrix.

The paper by Mansour introduces the Turbo-Decoding Message-Passing (TDMP) algorithm, which reduces both the number of iterations and storage memory required to decode a block of data. The TDMP algorithm does not distinguish between check nodes and variable nodes instead processing each data value associated with a non-zero entry in the parity check matrix. Thus, messages are of only one type being propagated from node to node as soon as they are computed. These two factors contribute to reducing storage space and the number of iterations.

The TDMP algorithm operates on a sparse density parity check matrix comprising a number of columns and using a memory with a number of memory elements, each element associated with one of the columns of the matrix.

Decoding starts by initialising the memory with the received signal. Then an iterative procedure is started. In a first step the difference between the memory content and a row of the matrix is formed and this difference is then decoded using a suitable decoding algorithm, such as the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm. The output of this decoding step is then written back into the parity check matrix in place of the row operated upon by the decoder. The sum of the output of the decoding step and the above formed difference is then calculated and written back into the memory. The iterative procedure continues by applying the above discussed calculation to each row of the matrix, using the output of the previous decoding step stored in the memory as new input. Once all rows of the matrix have been operated upon the entire procedure is or can be repeated for the entire matrix until acceptable convergence has been achieved or a preset maximum number of iterations steps has been reached.

It will be appreciated that each decoding iteration requires a reading and writing access to the memory. During this reading and writing access only those memory elements are operated upon that are associated with a column of the parity check matrix that comprises a non-zero entry. If two rows were operated upon each having a non-zero entry in the same column position, then decoders operating on the two rows may attempt to access the memory element associated with the column position, thereby causing conflict between the two decoders' processes that may falsify the decoder outputs. If in contrast two rows could be chosen so that the two rows together comprise at most one non-zero entry for each column position, then the decoders decoding the two rows could not conflict when accessing the memory elements and the two rows could be decoded in parallel and independently.

Mansour proposes to design parity check matrices so that they comprise a number of rows that, if grouped in a block of rows, comprise at most one non-zero entry per column. In other words, the weight of the column of the row block proposed by Mansour is at most one. Such matrices have been termed Architecture-Aware Sparse Parity Check Matrix (AA-SPCM) and an example of such a matrix is shown in Figure 1. As can be seen from Figure 1, the AA-SPCM is constructed from sub-matrices where I is the SxS identity matrix and In is I with its columns permuted according to the permutation fl. The resulting parity check matrix has M rows made up of D row blocks A1 to A0, each of which has S rows. The matrix M also has N columns made up of B column-blocks B1 to BB, each of which has S columns. This matrix architecture ensures that for the rows in each of the D row blocks only one non-zero entry is provided per column.

Figure 2 shows a matrix H comprising four row blocks, each row block comprising B blocks. Each block is a permuted identity matrix. As can be seen from Figure 2, each column in each block comprises exactly one non-zero entry. The matrix shown in Figure 2 relates to a partially parallel version of the TDMP algorithm (P-TDMP). The P-TDMP algorithm proposed by Mansour is partially parallel in that it allows parallel decoding of all six rows for one sub-block (e.g. row block Ai) by simultaneously using six decoders/processors. The same six decoders can subsequently be employed to parallel decode the rows of the next sub-block (e.g. row block A2).

Decoding of a row involves processing values associated with the non-zero entries within that row. Each row decoder has a local memory for storing these values. Additionally, a global memory stores the sum of all decoded values generated by the row decoding process. By using permuted identity matrices to construct a row-block as shown in Figure 2, where S=6, each column within a row block has only a single non-zero entry, i.e. no rows overlap.

Since the global memory accumulates updates from each processed row and no rows within a row-block overlap each row can be decoded independently.

Figure 3 shows a decoding system comprising six decoders 15, each comprising local memories 20. A globally shared memory 25 of size D*c, where c is the row weight, is used for storing the accumulated messages, which pertain to the decoded output bit (and column position). Each memory 15 is shown to hold one of the rows A of each row block A1 to A4, with i denoting the number of the row block A1 to A4 and j denoting the number of the row within the row block.

SUMMARY OF THE INVENTION

According to an aspect of the present invention there is provided a decoder comprising: a parity check matrix comprising non-zero entries; a scheduler specifying an order for the decoding of some or all of the non-zero entries of the parity check matrix; a plurality of sub-decoders for decoding at most a corresponding plurality of non-zero entries of the parity check matrix in parallel based on a received input signal, the sub-decoders arranged to decode the non-zero entries in accordance with a layered decoding algorithm and in the order specified by the scheduler, the operation of the sub-decoders arranged in time slots, each time slot for decoding at most one non-zero entry of a row by each sub-decoder; wherein the scheduler assigns each of the decoders to a row of the parity check matrix so that the sub-decoders decode rows of the parity check matrix in parallel, two of the rows each comprising a non-zero entry in the same column; wherein the scheduler causes the sub-decoders to operate on the non-zero entries of the rows in an order that avoids operation on non-zero entries in the same column in the same time slot.

By scheduling the operation of the sub-decoders so that, if two or more lows that have non-zero entries are encountered in a parity check matrix, as would normally be expected, no two sub-decoders operate on more than one non-zero value in each column of the matrix at any one time it is ensured that no memory element that may be used for storing computational results of the decoding process is accessed by more than one of the sub-decoders in each time slot. Memory contention problems that have limited the throughput of the P-TDMP algorithm are thus eliminated and the throughput of the decoder can therefore be considerably higher than the throughput achievable using the P-TDMP algorithm.

The layered decoding algorithm may be a turbo decoding message passing (TDMP) algorithm. The decoder may further comprise a memory arranged to store decoding results created by the sub-decoders. The memory may comprise one memory element for each of the columns of the party check matrix so that decoding results generated by the sub-decoders for a particular column can be stored in the associated memory element and read therefrom for use in later iterative steps. The sub-decoders may be arranged to have read and write access to the memory elements. A decoding algorithm performed by the decoder/sub-decoders may be substantially in accordance with the TDMP decoding algorithm but having the decoder time slots of the sub-decoders scheduled as set out above to avoid memory contention.

The parity check matrix may comprise a plurality of sub-sets of rows, wherein the weight of each column in each subset is at most one and wherein the two rows are located in different sub-sets. The weight of each column is may be exactly one. The parity check matrix may thus correspond to a known parity check matrix, such as one of the matrices shown in Figures 1 and 2. The scheduler described herein, however, enables the decoder to conduct paraUel decoding operations of rows without being confined to parallel decoding in a particular subset of rows or in a particular row-block. The decoder described herein can therefore conduct parallel decoding of some or all of the rows in a subset of rows and additionally can in parallel conduct decoding of other rows outside of the subset, as the order in which the non-zero values of the rows are operated upon is chosen such that memory contention is avoided. The number of rows decoded may thus be larger than the number of rows in each sub-set of the parity check matrix. The number of sub-decoders provided is therefore correspondingly larger than the number of rows in the sub-sets/row blocks of the matrix. Each sub-set of rows may comprise a plurality of permuted identity matrices. Additionally each sub-set of rows may also comprise one or more zero matrices. Alternatively one or more or all, of the sub-sets of rows consist of permuted identity matrices.

The scheduler may cause the sub-decoders to operate on the non-zero entries of one or both of the two rows in an order that differs from the order in which the non-zero entries occur in the row. It may thus be possible to use each of the available time slots of each of the sub-decoders for the decoding of a non-zero entry of the parity check matrix, thereby maximising throughput.

The scheduler may be arranged to cause a sub-decoder to operate on the non-zero entries of one of the two rows in ascending column order and another sub-decoder to operate on the non-zero entries of the other one of the two rows in descending column order. By decoding the non-zero elements of one row in the order defined by the non-zero elements' column position in the matrix and by decoding the non-zero elements of the second row in an order opposite to the order defined by the non-zero elements' column position in the matrix, it is ensured that, although two non-zero elements in the same column have to be operated upon, these two non-zero elements are not operated upon in the same time slot. Decoding of the first one of the two non-zero elements will therefore produce an output that can be stored in an associated memory element. Decoding of the second non-zero element can then be based on this stored output.

Instead of or in addition to reversing the order in which the non-zero elements of a row are decoded the scheduler may prescribe that decoding of a row does not start with either of the first or last non-zero row element in the row.

Instead, the scheduler may be arranged to cause a sub-decoder to operate on the non-zero entries of one or both of the two rows in ascending column order but not starting with the first non-zero entry in the row. Alternatively the scheduler may be arranged to cause a sub-decoder to operate on the non-zero entries of one or both of the two rows in descending column order but not starting with last non-zero entry in the row.

It is also envisaged that the order in which the non-zero elements appear in a row is not maintained at all by the scheduler. Instead, the sub-decoders may be scheduled to operate on the non-zero elements in any order, provided that no two non-zero elements in one column are operated upon within the same time slot.

The decoder may comprises a first scheduler arranged to cause a first number of processors to operate in parallel on a corresponding number of rows of the matrix. The decoder may further comprise a second scheduler arranged to cause a second number of processors to operate in parallel on a corresponding second number of rows of the matrix, wherein the second number is smaller than the first number. A switch for switching the sub-decoder from a mode in which processors are operated using the first scheduler to a mode in which processors are operated using the second scheduler may also be provided.

The decoder may therefore be operated in one of two or more modes.

One of these modes may, for example, be a highly parallel mode operating on a-parity check matrix utilising the scheduler to achieve parallelism beyond that introduced by Mansour, as discussed above. In a second mode the decoder could be operated to provide the degree of parallelism suggested by Mansour by decoding only on those rows in parallel that are provided in a sub-set of rows/row block. Reducing the degree of parallelism of course reduces the amount of throughput achievable by the decoder. At the same time, however, a reduction in the degree of parallelism can bring about a reduction in power consumption, for example if a smaller number of sub-processors is simultaneously operated. Thus, should a low power state be detected in a device comprising the decoder, the decoder can be switched to the second mode for power saving purposes.

It can alternatively be envisaged that a device receiving a signal moves between areas of good and poor signal quality. As only a given amount of throughput is required, the device may be switched from the first mode, which may have to be employed to be able to conduct a higher number of iterative steps for decoding a signal in case of poor signal quality, to the second mode for power saving purposes, if the signal received is of such improved quality that a lower number of iterative steps suffices.

This has been recognised as advantageous in its own right and according to another aspect of the present invention there is therefore provided decoder comprising: a parity check matrix comprising non-zero entries; a plurality of sub-decoders for parallel decoding of a plurality of rows of the parity check matrix based on a received input signal in accordance with a layered decoding algorithm; a first scheduler specifying a first order for decoding the rows of the parity check matrix using a first number of the sub-decoders; and a second scheduler specifying a second order for decoding the rows of the parity check matrix using a second number of the sub-decoders; wherein the first number of sub-decoders is larger than the second number of sub-decoders.

The decoder may further comprise a switch for switching the decoder from a mode in which the sub-decoders are arranged to decode the non-zero elements of the matrix according to the first order specified by the first scheduler to a mode in which the sub-decoders are arranged to decode the non-zero elements of the matrix according to the second order specified by the second scheduler.

The decoder may comprise an input for receiving a signal for switching said switch. This input may relate to a signal from a host controlling the operation of the decoder to switch between the modes. Alternatively or additionally the input may relate to a power monitor arranged to send a signal to the decoder causing the decoder to switch to the second mode if the power available for operating the decoder falls below a predetermined threshold.

It has further been recognised that, even it the parity check matrix is an Architecture-Aware Sparse Parity Check Matrix (AA-SPCM), which already allows a degree of parallelism, it may nevertheless be possible to decode rows of the AA-SPCM in parallel if these rows are not in the same row block/sub-block of the matrix, that is if the rows do not form part of the same permuted identity matrix. Different decoding orders for AA-SPCMs are therefore possible.

This has been recognised as being advantageous in its own right and according to another aspect of the present invention there is provided a decoder comprising: a parity check matrix comprising non-zero entries, the matrix comprising a plurality of sub-sets of rows, each sub-set of rows comprising a plurality of permuted identity matrices; a plurality of sub-decoders for parallel decoding of a plurality of rows of the parity check matrix based on a received input Signal in accordance with a layered decoding algorithm; a scheduler specifying the order in which the sub-decoders in use decode the rows, the scheduler arranged to cause the sub-decoders to simultaneously decode two or more rows in parallel, wherein said two or more rows are from more than one sub-set of rows and wherein the two or more rows are selected so that each column comprises no more than one non-zero entry.

One or more of the sub-sets of rows may also each comprise one or more zero matrices. Alternatively one or more or all, of the sub-sets of rows consist of permuted identity matrices.

An arrangement may therefore be provided where all of the rows of a sub-set of rows are decoded in parallel, as envisaged by Mansour, and where additionally further rows are decoded at the same time to increase throughput.

Such further parallel decoding can, for example, be possible where one sub-set of rows comprises zero matrices at column positions where another sub-set of rows comprises permuted identity matrices and vice versa.

The present invention also extends to a device comprising the decoder.

Such a device may comprise a receiver and the decoder wherein the decoder is arranged to receive signals from said receiver. The device may be a terminal, a base station or an access point. The device may alternatively be memory storage device comprising the decoder.

According to another aspect of the present invention there is provided a method of decoding a signal in accordance with a layered decoding algorithm comprising: providing a parity check matrix comprising non-zero entries; decoding a plurality of rows of the parity check matrix in parallel based on a received input signal using a corresponding plurality of decoders, wherein at least two rows of the plurality of rows each comprise a non-zero entry in the same column, the operation of the decoders arranged in time slots, each decoder decoding one non-zero entry of a row in each time slot in accordance with a layered decoding algorithm, wherein the decoders operate on the non-zero entries of the rows in an order that avoids operation on non-zero entries in the same co'umn in the same time slot.

According to another aspect of the present invention there is provided a method of decoding a signal comprising: using a plurality of decoders to decode a parity check matrix in accordance with a layered decoding algorithm according to a first order, wherein a first number of rows of the matrix is decoded in parallel; changing the order in which the rows of a parity check matrix are decoded to a second order, wherein a second number of rows of the matrix is decoded in parallel in accordance with the layered decoding algorithm according to the second order, wherein the second number is smaller than the first number.

According to another aspect of the present invention there is provided a method of decoding a signal comprising: providing a parity check matrix comprising non-zero entries, the matrix comprising a plurality of sub-sets of rows, each sub-set of rows comprises or consists of a plurality of permuted identity matrices; selecting two or more rows from more than one of the sub-sets so that the selected rows comprise at most one non-zero entry in each column; causing a plurality of sub-decoders to simultaneously decode the selected rows in accordance with a layered decoding algorithm.

One or more of the sub-sets of rows may also each comprise one or more zero matrices. Alternatively one or more or all, of the sub-sets of rows consist of permuted identity matrices. The layered decoding algorithm used in any of the above disclosed methods may be a turbo decoding message passing (TDMP) algorithm.

According to another aspect of the present invention there is provided a method of creating a schedule suitable for decoding a parity check matrix according to a layered decoding algorithm when a predetermined number of decoders is used, the schedule suitable for use with decoders having a time slot based operation so that one non-zero entry of the matrix is decoded in each time slot for each decoder, the method comprising: a) allocating a row of the matrix to each decoder; b) selecting a non-zero entry of the matrix; c) selecting a free decoding time slot of the decoder associated with the row comprising the non-zero entry; d) checking if another decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot; e) if it is found that another decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot selecting another free decoding time slot of the decoder associated with the row comprising the non-zero entry; f) repeating steps d) and e) until it is found that no other decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot; g) repeating steps b) to f) until a time slot has been allocated to each non-zero entry of the matrix.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 shows a known Architecture-Aware Sparse Parity Check Matrix; Figure 2 shows a sparse density parity check matrix comprising blocks of permuted identity matrices; Figure 3 shows a known parallel decoders system comprising six parallel decoders; Figure 4 shows the non-zero entries of two rows n and k of a sparse density parity check matrix; Figure 5 illustrates the general case of a number of row comprising non-zero entries; Figure 6 shows an algorithm for computing a schedule; Figure 7 shows a parity check matrix; Figure 8 shows a schedule for decoding the parity check matrix shown in Figure 7 using the P-TDMP algorithm; Figure 9 shows a schedule for decoding the parity check matrix shown in Figure 7 created using the algorithm illustrated in Figure 6; Figure 10 shows a further exemplary schedule for decoding the parity check matrix shown in Figure 7; Figure 11 shows another parity check matrix that is not based on permuted identity matrices; and Figure 12 shows a schedule for decoding the parity check matrix shown in Figure 11 using the algorithm shown in Figure 6.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Figure 4 shows the non-zero entries of two rows n and k, taken from two different row blocks p and q of a P-TDMP sparse density parity check matrix, such as the matrix shown in Figure 2. Figure 4 indicates only the positions of the non-zero entries of each of the rows. Zero entries of these rows are omitted as they are not operated upon during decoding of the rows. As can be seen from Figure 4, both row n and row k comprise non-zero entries in their respective first columns. Thus, if two decoders were to start decoding the rows p and q in parallel in a decoder arrangement such as the arrangement shown in Figure 3, then both decoders would attempt to simultaneously access the memory element storing the accumulated decoding results pertaining to the first column of the matrix. As mentioned above, such memory contention can falsify the decoding result.

This problem is overcome in Figure 4 by decoding row n by starting decoding in column one and proceeding to decode the non-zero entries of row n in ascending column order, as would also be the case in the prior art decoder shown in Figure 3. Row k, however, is decoded by starting decoding in the last non-zero column, column 52 in this case. Thus, while a first decoder decodes the non-zero entry of column one in row n the second decoder decodes the non-zero entry of column 52 in row k. The first decoder's attempts to access the memory element associated with column one therefore does not conflict with a memory access attempt of the second decoder to the same memory element, as the second decoder operates on a different column and hence has no need to access this memory element. The second decoder also performs a decoding operation for the non-zero entry in column I of row k. This decoding operation is, however, performed in a later time slot after the first decoder has completed its decoding operation of column 1 of row n. Thus memory contention is avoided by re-scheduling the order in which the non-zero entries in each row are decoded.

Returning to the example shown Figure 2, row 1 in row-block A1 shares an entry in the first column position with row 1 of row-block A. Thus, in this case a contention for the same globally shared memory location would occur if both rows were being decoded at the same time using the P-TDMP decoding method, By starting at different points within the two rows this memory contention problem is solved, thereby enabling the decoding system to decode an increased number of rows in parallel, consequently increasing decoder throughput.

Figure 5 shows the general case where r, is the column position containing a "1" for the ith row for j=1:c, where c is the weight of row i. For any pair of rows from different row-blocks where there is a common column entry we have: = i = 1,...,M, j = 1,...,c (1) i' = 1,...,IVI, j' l,...,c1, i!=i For the case where j!= j', that is in cases where the non-zero entries of two rows are not in the same columns, decoding of both rows can start at the same position. For the case where j= j', that is in cases where two rows both have a non-zero entry in the same column, decoding needs to start at different positions for the two rows.

An algorithm 200 for generating a schedule according to which the parallel sub-decoders of a decoder may operate on the non-zero entries of in a parity check matrix is illustrated in Figure 6. This algorithm can be applied to a given parity check matrix to calculate the scheduling order of all available decoders prior to processing of data. The resulting schedule can then be incorporated in a decoder.

As discussed above, it is important that two non-zero entries in a column are not decoded in the same time slot so that memory elements storing decoding results are not accessed simultaneously. Figure 6 shows a flow chart for creating a schedule for operating a number of decoders in parallel for solving the rows of a parity check matrix in a TDMP decoding algorithm. The algorithm 200 shown in Figure 6 allocates time slots for decoding the non-zero entries of the rows of the parity check matrix on a column by column basis. It will be appreciated that the algorithm in Figure 6 is merely one of a number of possible algorithms that may be useable for creating a schedule for operating a number of decoders in parallel for solving the rows of a parity check matrix in a TDMP decoding algorithm and that the present invention is not limited to this particular algorithm.

In a first step 205 the non-zero entries foi a column of the parity check matrix are identified. In the next step 210 a time slot that in not yet in use in decoding the non-zero entries of the row comprising the non-zero entry in question is allocated for decoding this particular non-zero entry. To avoid memory contention it is next checked in step 215 if the chosen time slot is already used by a decoder responsible for the decoding of a different row for decoding a non-zero entry of the column under consideration. Should this be the case, then the next available time slot for the row is instead allocated to the decoding of the non-zero entry in question in step 220. The checking step 215 is then repeated to ensure that allocation of the newly chosen time slot does not cause memory contention problems.

Once a suitable time slot has been chosen for the decoding of a particular non-zero entry in a particular column the method moves on to the next non-zero entry in the column in steps 225 and 230 or to the next column (steps 235 and 240) if all non-zero entries in a column have been allocated a time slot. If all columns have been processed the schedule is completed and the method terminates.

Figure 7 shows a parity check matrix with the non-zero column positions in each row shown to the right. In a P-TDMP decoder according to Mansour's proposal three decoders would be used in parallel to decode the three rows of a row block. The first row block has a row weight of three thus three parallel decoders would take three time slots to process the three non-zero values in each row. The next row block has a íow weight of four resulting in three decoders taking four time slots to process the four non-zero values in a row.

The third row block also has a row weight of four and therefore would take four time slots to process using three parallel decoders.

Figure 8 shows the timing diagram/schedule that may be used in the row-block based approach of decoding proposed by Mansour using three decoders operating in parallel on the three rows in each row block. The number entries in the table of Figure 8 indicate the column operated upon by a decoder in a given time slot. As can be seen from Figure 8, no column is operated upon by two decoders in the same time slot, therefore providing the basis for parallel processing on the rows in the row blocks.

It can be seen from Figure 8 that it would be difficult to achieve a further improvement in parallel processing (and therefore in throughput) using the P-TDMP method, even if further decoders were provided. For example, if one or more of rows 4 to 6 were decoded in parallel with rows 1 to 3, then in the third time slot the decoders would attempt to simultaneously access the memory associated with columns 13 to 15. Equally, if one or more of rows 7 to 9 were decoded in parallel with rows 1 to 3, then in the first time slot the decoders would attempt to simultaneously access the memory associated with columns 1 to 3.

Using the scheduling algorithm shown in Figure 6 provides the schedule shown in Figure 9. With regard to the method illustrated by the flow chart shown in Figure 6, this method allocates time slots on a column by column basis. Thus, when analysing the non-zero entries in the first column of the matrix shown in Figure 7 the method of Figure 6 first determines in step 205 that the column comprises non-zero entries in rows 2 and 8. If nine decoders are to be operated in parallel (which is assumed for the purpose of this discussion), then decoders 2 and 8 would be tasked with decoding these values. None of the time slots available to decoder 2 has been allocated when step 210 is first executed and the non-zero entry in the second row of the matrix of Figure 7 is thus scheduled in the first time slot, as is shown in Figure 10.

The answer to the question in step 215 is "No" as no other time slots have yet been allocated and the method next moves on to step 210 (through steps 225 and 230) to schedule a time slot for processing the non-zero entry in column I of row 8. The first available time slot for decoder 8 is the first time slot A check in step 215, however, shows that this time slot has already been allocated to the decoding of the first column non-zero entry of row 2. As a consequence decoder 8 is instead scheduled to decode the non-zero entry in column 1 of row 8 in the next time slot available to decoder 8, namely the second time slot, as shown in Figure 10. A check in step 215 shows that this time slot is not yet in use for decoding a non-zero entry in column 1.

The method shown in Figure 6 then moves on to the second column.

The first time slot of decoder 3 is allocated to the decoding of the non-zero entry in row 3 in the manner discussed above with regard to the first non-zero entry in column 1 of row 2. Equally, the second time slot of decoder 9 is allocated to the decoding of the non-zero entry in the second column of row 9, as allocation of the fist time slot would lead to memory contention with decoder 3.

The method shown in Figure 6 continues in this fashion until column 7 is reached. The non-zero entry in column 7 is in row 9. The next available time slot for decoder 9 is the first time slot despite the fact that the non-zero entry in column 7 is not the first non-zero entry in row 9. This is because decoding of the first non-zero entry in row 9 is conducted in the second time slot to avoid memory contention with the decoder of row 3, as discussed above. The method of Figure 6 thus causes the first two non-zero entries of row 9 not to be decoded in the column order in which the non-zero entries occur in the row but in reverse order. The person skilled in the art will be able to follow the remainder of the time-slot allocation routine illustrated in Figure 6 as applied to the matrix shown in Figure 7 based on the above explanation.

An alternative schedule for decoding the non-zero entries of the matrix shown in Figure 7 using nine parallel decoders is shown in Figure 10.

As can be seen from Figures 9 and 10 the parallel operation of nine decoders is scheduled. It will be noticed that no column position is mentioned twice in each time slot. Thus, despite the high degree of parallelism achieved by the schedules shown in Figure 9 and 10, no memory contention occurs. It will be noticed that Figures 9 and 10 differ from Figure 8 in that in contrast to Figure 8 the schedules of Figures 9 and 10 do not prescribe that the non-zero entries in each of the rows have to be operated upon in a sequence defined by ascending column number. Instead the schedules of Figure 9 and 10 require the decoders to operate on the non-zero entries in an order that does not correspond to the order of in which the non-zero entries appear in the rows.

The throughput for the parallel TDMP algorithm described by Mansour for a decoder operating at a frequency f, performing T decoding iterations using b bits to represent the messages is given as follows: bits/s (2) cTD With the proposed decoding schedule the potential throughput would be: bfB (3) RXS bits/s cTM where M is the total number of rows in the parity check matrix and PR is the number of rows being decoded in parallel. In the limit, where all rows may be decoded in parallel, PRM resulting in a D fold increase in throughput.

Let b1, f1, T1, c (3*3+4*6)19=37 B7, D=3, M9, PR=9 and S=3, then from equations (2) and (3) the throughput for the conventional and proposed decoder are calculated to be 1.9 and 5.7 bits/s. A threefold increase in throughput over that provided by Mansour is therefore achieved.

The parity check matrix shown in Figure 7 comprises three row blocks, wherein each column in a row block has a weight of at most one. The present invention can also be applied to parity check matrices of different type. In particular the present invention may be applied to parity check matrices that comprise more non-zero entries than the parity check matrix shown in Figure 7.

An example of such a different parity check matrix is shown in Figure 11. As can be seen from this Figure, each of the three row blocks of the parity check matrix comprises columns with a weight greater than one. Thus, if it were attempted to use this parity check matrix in a P-TDMP decoding operation memory contention problems would occur, if the rows of each row block were to be decoded in parallel.

The present invention allows allocating the time slots of a number of parallel sub-decoder so that no two non-zero entries in a column of the parity check matrix shown in Figure 11 are operated upon in the same time slot by two sub-decoders. The method illustrated in the flow chart shown in Figure 6 may, for example be used to allocate each decoding operation required when the parity check matrix shown in Figure 11 is used for decoding to a number of parallel decoders while avoiding memory contention problems. Figure 12 shown a schedule created by application of this method illustrated in Figure 6 to the parity check matrix shown in Figure 11. The operation of the method illustrated in Figure 6 has been described above.

As can be seen from Figure 12, decoding of the parity check matrix shown in Figure 11 can be performed in as little as seven time slots if nine parallel sub-decoders are used. The use of seven time slots is required as row seven of the parity check matrix comprises seven non-zero entries.

It will be appreciated that some degree of parallel processing may be possible for parity check matrix shown in Figure 11 even without the use of the present invention if the rows of the parity check matrix were simply reordered for decoding with a smaller number (say three) of sub-decoders. It can, for example, be envisaged that rows 1, 6 and 8 are decoded in parallel using three sub-decoders. Six time slots would be required for this operation due to the weight of rows 6 and 8. Rows 2, 4 and 9 could subsequently be decoded in parallel using the same sub-decoders (requiring a further six time slots due to the weight of row 4) and rows 5 and 7 could then be decoded in yet another decoding cycle (requiring a further seven time slots due to the weight of row 7), followed by a decoding of the last remaining row, row 3, which requires a further four time slots. It will be appreciated that such a decoding operation is, however, considerably more time consuming than a decoding operation following the schedule shown in Figure 12, as it would require at least 23 time slots as opposed to the seven required when the schedule shown in Figure 12 is used. No doubt many other effective alternatives will occur to the skilled person. It will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.

Claims

CLAIMS: 1. A decoder comprising: a parity check matrix comprising non-zero entries; a scheduler specifying an order for the decoding of some or all of the non-zero entries of the parity check matrix; a plurality of sub-decoders for decoding at most a corresponding plurality of non-zero entries of the parity check matrix in parallel based on a received input signal, the sub-decoders arranged to decode the non-zero entries in accordance with a layered decoding algorithm and in the order specified by the scheduler, the operation of the sub-decoders arranged in time slots, each time slot for decoding at most one non-zero entry of a row by each sub-decoder; wherein the scheduler assigns each of the decoders to a row of the parity check matrix so that the sub-decoders decode rows of the parity check matrix in parallel, two of the rows each comprising a non-zero entry in the same column; wherein the scheduler causes the sub-decoders to operate on the non-zero entries of the rows in an order that avoids operation on non-zero entries in the same column in the same time slot.
2. A decoder according to claim 1, wherein the parity check matrix comprises a plurality of sub-sets of rows, wherein the weight of each column in each subset is at most one and wherein the two rows are located in different sub-sets.
3. A decoder according to claim 2, wherein the weight of each column in each subset is exactly one.
4. A decoder according to claims 2 or 3, wherein each sub-set of rows comprises a plurality of permuted identity matrices.
5. A decoder according to any preceding claim, wherein the scheduler is arranged to cause a sub-decoder to operate on the non-zero entries of one of the two rows in ascending column order and another sub-decoder to operate on the non-zero entries of the other one of the two rows in descending column order.
6. A decoder according to any of claims 1 to 4, wherein the scheduler is arranged to cause a sub-decoder to operate on the non-zero entries of one or both of the two rows in ascending column order but not starting with the first non-zero entry in the row.
7. A decoder according to any of claims 1 to 4, wherein the scheduler is arranged to cause a sub-decoder to operate on the non-zero entries of one or both of the two rows in descending column order but not starting with the last non-zero entry in the row.
8. A decoder as claimed in any preceding claim, wherein the scheduler is a first scheduler arranged to cause a first number of processors to operate in parallel on a corresponding number of rows of the matrix, the decoder further comprising a second scheduler arranged to cause a second number of processors to operate in parallel on a corresponding second number of rows of the matrix, the second number being smaller than the first number; and a switch for switching the sub-decoder from a mode in which processors are operated using the first schedule to a mode in which processors are operated using the second schedule.
9. A decoder as claimed in any preceding claim, wherein the layered decoding algorithm is a turbo decoding message passing (TDMP) algorithm.
10. A decoder comprising: a parity check matrix comprising non-zero entries; a plurality of sub-decoders for parallel decoding of a plurality of rows of the parity check matrix based on a received input signal in accordance with a layered decoding algorithm; a first scheduler specifying a first order for decoding the rows of the parity check matrix using a first number of the sub-decoders; and a second scheduler specifying a second order for decoding the rows of the parity check matrix using a second number of the sub-decoders; wherein the first number of sub-decoders is larger than the second number of sub-decoders; the decoder further comprising a switch for switching the decoder from a mode in which the sub-decoders are arranged to decode the non-zero elements of the matrix according to the first order specified by the first scheduler to a mode in which the sub-decoders are arranged to decode the non-zero elements of the matrix according to the second order specified by the second scheduler.
11. A decoder comprising: a parity check matrix comprising non-zero entries, the matrix comprising a plurality of sub-sets of rows, each sub-set of rows comprises a plurality of permuted identity matrices; a plurality of sub-decoders for parallel decoding of plurality of rows of the parity check matrix based on a received input signal and in accordance with a layered decoding algorithm; a scheduler specifying the order in which the sub-decoders in use decode the rows, the scheduler arranged to cause the sub-decoders to simultaneously decode two or more rows in parallel, wherein said two or more rows are from more than one sub-set of rows and wherein the two or more rows are selected so that each column comprises no more than one non-zero entry.
12. A terminal, a base station or an access point comprising a decoder according to any preceding claim.
13. A memory storage device comprising a decoder according to any of claims ito 11.
14. A method of decoding a signal in accordance with a layered decoding algorithm comprising: providing a parity check matrix comprising non-zero entries; decoding a plurality of rows of the parity check matrix in parallel based on a received input signal using a corresponding plurality of decoders, wherein at least two rows of the plurality of rows each comprise a non-zero entry in the same column, the operation of the decoders arranged in time slots, each decoder decoding one non-zero entry of a row in each time slot in accordance with the layered decoding algorithm, wherein the decoders operate on the non-zero entries of the rows in an order that avoids operation on non-zero entries in the same column in the same time slot.
15. A method according to claim 14, wherein providing a parity check matrix comprises providing a matrix with a plurality of sub-sets of rows, wherein the weight of each column in each subset is at most one and wherein the two rows are located in different sub-sets.
16. A method according to claims 15, wherein each sub-set of rows comprises a plurality of permuted identity matrices.
17. A method according to claims 14, 15 or 16, wherein one of the decoders operates on the non-zero entries of one of the two rows in ascending column order and another decoder operates on the non-zero entries of the other one of the two rows in descending column order.
18. A method according to any of claims 14 to 17, wherein a decoder decodes the non-zero entries of at least one of the two rows in ascending column order but not starting with the first non-zero entry in the row.
19. A method according to any of claims 14 to 18, wherein a decoder operates on the non-zero entries of at least one of the two rows in descending column order but not starting with the last non-zero entry in the row.
20. A method according to any of claims 14 to 19, wherein said simultaneous decoding is according to a first order, the method further comprising switching the order in which the parity check matrix is decoded from the first order to a second order, wherein when decoding the parity check matrix according to the second order a smaller number of decoders is used than when decoding the parity check matrix according to the first order.
21. A method of decoding a signal comprising: using a plurality of decoders to decode a parity check matrix in accordance with a layered decoding algorithm and according to a first order, wherein a first number of rows of the matrix is decoded in parallel; changing the order in which the rows of a parity check matrix are decoded to a second order, wherein a second number of rows of the matrix is decoded in parallel in accordance with the layered decoding algorithm according to the second order, wherein the second number is smaller than the first number.
22. A method of decoding a signal comprising: providing a parity check matrix comprising non-zero entries, the matrix comprising a plurality of sub-sets of rows, each sub-set of rows comprises a plurality of permuted identity matrices; selecting two or more rows from more than one of the sub-sets so that the selected rows comprise at most one non-zero entry in each column; causing a plurality of sub-decoders to simultaneously decode the selected rows in accordance with a layered decoding algorithm.24. A method according to any of claims 14 to 22, wherein the layered decoding algorithm is a turbo decoding message passing (TDMP) algorithm.25. A method of creating a schedule suitable for decoding a parity check matrix according to a a layered decoding algorithm when a predetermined number of decoders is used, the schedule suitable for use with decoders having a time slot based operation so that one non-zero entry of the matrix is decoded in each time slot for each decoder, the method comprising: a) allocating a row of the matrix to each decoder; b) selecting a non-zero entry of the matrix; c) selecting a free decoding time slot of the decoder associated with the row comprising the non-zero entry; d) checking if another decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot; e) if it is found that another decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot selecting another free decoding time slot of the decoder associated with the row comprising the non-zero entry; f) repeating steps d) and e) until it is found that no other decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot; g) repeating steps b) to f) until a time slot has been allocated to each non-zero entry of the matrix.Amendments to the Claims have been filed as follows: CLAIMS: 1. A decoder comprising: a parity check matrix comprising non-zero entries; a scheduler specifying an order for the decoding of some or all of the non-zero entries of the parity check matrix; a plurality of sub-decoders for decoding at most a corresponding plurality of non-zero entries of the parity check matrix in parallel based on a received input signal, the sub-decoders arranged to decode the non-zero entries in accordance with a layered decoding algorithm and in the order specified by the scheduler, the operation of the sub-decoders arranged in time slots, each time slot for decoding at most one non-zero entry of a row by each sub-decoder; wherein the scheduler assigns each of the decoders to a row of the parity check matrix so that the sub-decoders decode rows of the parity check matrix in parallel, two of the rows each comprising a non-zero entry in the same column; wherein the scheduler causes the sub-decoders to operate on the non-zero entries of the rows in an order that avoids operation on non-zero entries in the same column in the same time slot.2. A decoder according to claim 1, wherein the parity check matrix comprises a plurality of sub-sets of rows, wherein the weight of each column in each subset is at most one and wherein the two rows are located in different **.* * * * sub-sets. **** * * S...3. A decoder according to claim 2, wherein the weight of each column in each subset is exactly one.4. A decoder according to claims 2 or 3, wherein each sub-set of rows comprises a plurality of permuted identity matrices.5. A decoder according to any preceding claim, wherein the scheduler is arranged to cause a sub-decoder to operate on the non-zero entries of one of the two rows in ascending column order and another sub-decoder to operate on the non-zero entries of the other one of the two rows in descending column order.6. A decoder according to any of claims I to 4, wherein the scheduler is arranged to cause a sub-decoder to operate on the non-zero entries of one or both of the two rows in ascending column order but not starting with the first non-zero entry in the row.7. A decoder according to any of claims I to 4, wherein the scheduler is arranged to cause a sub-decoder to operate on the non-zero entries of one or both of the two rows in descending column order but not starting with the last non-zero entry in the row.8. A decoder as claimed in any preceding claim, wherein the scheduler is a first scheduler arranged to cause a first number of processors to operate in parallel on a corresponding number of rows of the matrix, the decoder further comprising a second scheduler arranged to cause a second number of processors to operate in parallel on a corresponding second number of rows of the matrix, the second number being smaller than the first number; and a switch for switching the sub-decoder from a mode in which processors are operated using the first schedule to a mode in which processors are :::: operated using the second schedule. *9. A decoder as claimed in any preceding claim, wherein the layered decoding algorithm is a turbo decoding message passing (TDMP) algorithm.S S.. *10. A terminal, a base station or an access point comprising a decoder according to any preceding claim.S..... * S11. A memory storage device comprising a decoder according to any of claims I to 9.12. A method of decoding a signal in accordance with a layered decoding algorithm comprising: providing a parity check matrix comprising non-zero entries; decoding a plurality of rows of the parity check matrix in parallel based on a received input signal using a corresponding plurality of decoders, wherein at least two rows of the plurality of rows each comprise a non-zero entry in the same column, the operation of the decoders arranged in time slots, each decoder decoding one non-zero entry of a row in each time slot in accordance with the layered decoding algorithm, wherein the decoders operate on the non-zero entries of the rows in an order that avoids operation on non-zero entries in the same column in the same time slot.13. A method according to claim 12, wherein providing a parity check matrix comprises providing a matrix with a plurality of sub-sets of rows, wherein the weight of each column in each subset is at most one and wherein the two rows are located in different sub-sets.14. A method according to claims 13, wherein each sub-set of rows comprises a plurality of permuted identity matrices.15. A method according to claims 12, 13 or 14, wherein one of the decoders operates on the non-zero entries of one of the two rows in ascending column order and another decoder operates on the non-zero entries of the other one of s.. the two rows in descending column order. S. * S S * S.16. A method according to any of claims l2to 15, wherein a decoder decodes the non-zero entries of at least one of the two rows in ascending column order but not starting with the first non-zero entry in the row.S.....17. A method according to any of claims 12 to 16, wherein a decoder operates on the non-zero entries of at least one of the two rows in descending column order but not starting with the last non-zero entry in the row.18. A method according to any of claims 12 to 17, wherein said simultaneous decoding is according to a first order, the method further comprising switching the order in which the parity check matrix is decoded from the first order to a second order, wherein when decoding the parity check matrix according to the second order a smaller number of decoders is used than when decoding the parity check matrix according to the first order.19. A method according to any of claims 12 to 18, wherein the layered decoding algorithm is a turbo decoding message passing (TDMP) algorithm.20. A method of creating a schedule suitable for decoding a parity check matrix according to a a layered decoding algorithm when a predetermined number of decoders is used, the schedule suitable for use with decoders having a time slot based operation so that one non-zero entry of the matrix is decoded in each time slot for each decoder, the method comprising: a) allocating a row of the matrix to each decoder; b) selecting a non-zero entry of the matrix; c) selecting a free decoding time slot of the decoder associated with the row comprising the non-zero entry; d) checking if another decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot; *is e) if it is found that another decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot selecting another free decoding time slot of the decoder associated with the row comprising the non-zero entry; f) repeating steps d) and e) until it is found that no other decoder is scheduled to decode a non-zero entry in the column of the matrix comprising the selected non-zero entry in the selected time slot; g) repeating steps b) to f) until a time slot has been allocated to each non-zero entry of the matrix.