US20060236194A1

US20060236194A1 - Decomposer for parallel turbo decoding, process and integrated circuit

Info

Publication number: US20060236194A1
Application number: US11/455,903
Authority: US
Inventors: Alexander Andreev; Ranko Scepanovic; Vojislav Vukovic
Original assignee: LSI Logic Corp
Current assignee: LSI Corp
Priority date: 2002-11-19
Filing date: 2006-06-19
Publication date: 2006-10-19
Also published as: US20040098653A1; US7096413B2

Abstract

A decoder for access data stored in n memories comprises a function matrix containing addresses of the memory locations at unique coordinates. A decomposer sorts addresses from coordinate locations of first and second m×n matrices, such that each row contains no more than one address from the same memory. Positional apparatus stores entries in third and fourth m×n matrices identifying coordinates of addresses in the function matrix such that each entry in the third matrix is at coordinates that matches corresponding coordinates in the first matrix, and each entry in the fourth matrix is at coordinates that matches corresponding coordinates in the second matrix. The decoder is responsive to entries in the matrices for accessing data in parallel from the memories.

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application is a division of and claims priority from U.S. application Ser. No. 10/299,270, filed Nov. 19, 2002, now U.S. Pat. No. ______, which is entitled “DECOMPOSER FOR PARALLEL TURBO DECODING, PROCESS AND INTEGRATED CIRCUIT” and is assigned to the same assignee.

FIELD OF THE INVENTION

This invention relates to parallel data processing, and particularly to integrated circuits that perform parallel turbo decoding.

BACKGROUND OF THE INVENTION

Data processing systems using convolutional codes are theoretically capable of reaching the Shannon limit, a theoretical limit of signal-to-noise for error-free communications. Prior to the discovery of turbo codes in 1993, convolutional codes were decoded with Viterbi decoders. However, as error correction requirements increased, the complexity of Viterbi decoders exponentially increased. Consequently, a practical limit on systems employing Viterbi decoders to decode convolutional codes was about 3 to 6 dB from the Shannon limit. The introduction of turbo codes allowed the design of practical decoders capable of achieving a performance about 0.7 dB from the Shannon limit, surpassing the performance of convolutional-encoder/Viterbi-decoders of similar complexity. Therefore, turbo codes offered significant advantage over prior code techniques.
Convolutional codes are generated by interleaving data. There are two types of turbo code systems: ones that use parallel concatenated convolutional codes, and ones that use serially concatenated convolutional codes. Data processing systems that employ parallel concatenated convolutional codes decode the codes in several stages. In a first stage, the original data (e.g. sequence of symbols) are processed, and in a second stage the data obtained by permuting the original sequence of symbols is processed, usually using the same process as in the first stage. The data are processed in parallel, requiring that the data be stored in several memories and accessed in parallel for the respective stage. However, parallel processing often causes conflicts. More particularly, two or more elements or sets of data that are required to be accessed in a given cycle may be in the same memory, and therefore not accessible in parallel. Consequently, the problem becomes one of organizing access to the data so that all required data can simultaneously accessed in each of the processing stages.
Traditionally, turbo decoding applications increased throughput by adding additional parallel turbo decoders. However, in integrated circuit (IC) designs, the additional decoders were embodied on the IC and necessarily increased chip area dramatically. There is a need for a turbo decoder that achieves high throughput without duplication of parallel turbo decoders, thereby achieving reduced IC chip area.

SUMMARY OF THE INVENTION

The present invention is directed to a decomposer for turbo decoders, which makes possible parallel access to direct and interleaved information. When implemented in an IC chip, the decomposer eliminates the need for turbo decoder duplications, thereby significantly reducing chip area over prior decoders.
In one form of the invention, a process is provided to access data stored at addressable locations in n memories. A function matrix is provided having coordinates containing addresses of the addressable locations in the memories. A set of addresses from first and second matrices, each having m rows and n columns, is sorted into unique coordinate locations such that each row contains no more than one address of a location from each respective memory. Third and fourth matrices are created, each having m rows and n columns. The third and fourth matrices contain entries identifying coordinates of addresses in the function matrix such that each entry in the third matrix is at coordinates that matches corresponding coordinates in the first matrix and each entry in the fourth matrix is at coordinates that matches corresponding coordinates in the second matrix. Data are accessed in parallel from the memories using the matrices.
In some embodiments, the addresses are organized into first and second sets, S_r ^q, each containing the addresses. The sets are sorted into the first and second matrices. More particularly, for each set, a plurality of edges between the addresses are identified such that each edge contains two addresses, and each address is unconnected or in not more than two edges. The edges are linked into a sequence, and are alternately assigned to the first and second sets.
In some embodiments, each set, S_r ^q, of addresses is iteratively divided into first and second subsets S_r+1 ^2qand S_r+1 ^2q+1, which are placed into respective rows of the respective first and second matrices, until each row contains no more than one address of a location in each respective memory.
In other embodiments, a decomposer is provided to decompose interleaved convolutional codes. The decomposer includes the first, second, third and fourth matrices.
In yet other embodiments, an integrated circuit includes a decoder and a decomposer including the first, second, third and fourth matrices.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of a process of partitioning data into memories in accordance with an aspect of the present invention.
FIGS. 2-5 are illustrations useful in explaining the process of FIG. 1.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention is directed to a decomposer for turbo code decoding, which eliminates the need for turbo decoder duplications.
The premise of the present invention can be generalized by considering two arbitrary permutations of a set of numbers, which represents addresses in n memories where data for processing are stored. Assume that each memory is capable of storing a maximal number, m, of words. The addresses can be represented in two tables (matrices), one for each processing stage. Each table has m rows and n columns, and each row represents addresses to be accessed simultaneously during a given clock cycle. Each column represents the addresses in one memory.
In accordance with the present invention, the addresses are partitioned into groups such that each row in each of the two tables does not contain more than one address from the same group. Then, stored data from the same group of addresses in one memory allow simultaneous access to all addresses from any row and any table through access to different memories.
The algorithm to partition addresses uses input integer numbers m and n, and two m×n matrices, T₁and T₂, which represent two different permutations of a set of numbers S={0, 1, 2, . . . , n*m−1}. The numbers of set S represent addresses in the respective memory. The process of the present invention determines a function whose input set is in the form of {0, 1, 2, . . . , n*m−1} and provides an output set {0, 1, 2, . . . , 2^k−1}, where 2^k−1<n≦2^k, f:{0, 1, 2, . . . , n*m−1}→f:{0, 1, 2, . . . , 2^k−1}, such that for every i, j₁, j₂the relationship f(T_α[i] [j₁])!=f (T_α[i] [j₂]) is satisfied, where α=1, 2. The resulting partitioning gives 2^ksubsets of S, one for each function value, such that set S is represented as S=S₀∪S₁∪US₂. . . ∪S₂ _k ₋₁.
The output of the algorithm is a set of matrices, T₁and T₂, which provides the addresses of the memories (numbers from 0 to 2^k−1) and the local addresses of all data required to be accessed simultaneously within the memories for a processing stage.
Set S is partitioned in k stages. An intermediate stage is denoted by r, where 0≦r<k. At each stage, set S_r ^qis divided into two subsets S_r+1 ^2qand S_r+1 ^2q+1, where q is an index symbolically denoting the original set, q, divided into two new sets, 2 q and 2q+1. Starting with r=0, q=1, the initial set, S=S_r ^q, is divided into two subsets S_r+1 ^2qand S_r+1 ^2q+1. At the next stage, sets S_r+1 ^2qand S_r+1 ^2q+1are each divided to two descendants, S_r+1 ^2q=S_r°2 ^2(2q)∪S_r+3 ^2(2q+1)and S_r+1 ^2q+1S_r+2 ^2(2q+1)∪S_r+2 ^2(2q+)+1. The partitioning iterates until r=k, at which point the number of elements in each row is either 0 or 1. For example, for the initial set where r=0, S=S₀ ^q, is divided into two subsets S₁ ^2qand S₁ ^2q+1; sets S₁ ^2qand S₁ ^2q+1are each divided to two descendants, S₁ ^2q=S₂ ^2(2q)∪S₂ ^2(2q+1)and S₁ ^2q+1=S₂ ^2(2q+1)∪S₂ ^2(2q+1)+1.
The number of elements in each intermediate set is one of the two integers closest to m*n*2^−rif it is not already an integer so that both intermediate sets has m*n*2^−rpoints. For each intermediate set in the process, the number of set elements in a single row, m, of matrices T₁and T₂is less than or equal to n*2^−r.
At the end point (where r=k), the number of elements from each set S₂ _k ₋₁ ^qin each row of matrices T₁and T₂is equal 0 or 1, meaning that function f is determined (the indexes of subsets S₂ ^k ₋₁ ^qare values of f) and there is no need for further partitioning. Thus, there is no row, m, in either matrix T₁and T₂, which contains more than one element from the same subset. Hence, all numbers in a row have different function values.
The process of the partitioning algorithm is illustrated in FIG. 1. The process commences at step 100 with the input of the number n of memories and the size m of each memory. The value of r is initialized at 0. At step 102, k is calculated from the relationship 2^k−1<n≦2^k. S_r ^qis generated at step 104. Thus, at the first iteration, S₀ ^qis generated. If, at step 106, r is smaller than k, then at step 108 S_r ^qis divided as S_r ^q=S_r+1 ^2q∪S_r+1 ^2q+1. At step 110, the value of r is incremented by one and the process loops back to step 104 to operate on the recursions S₁ ^2qand S₁ ^2q+1. Assuming r is still smaller at k at step 106, for the second iteration where r=1, S₁ ^2qis divided as S₁ ^2q=S₂ ^(2q)∪S₂ ^2(2q)+1and S₁ ^2q+1is divided as S₁ ^2q+1=S₂ ^2(2q+1)∪S₂ ^2(q+1)+1. The process continues until r is equal to k at step 106. As long as r<k, the number of S_r ^qelements (addresses) resulting from each iteration of division in one row of T₁and T₂may be more than one. When r=k, each division result contains one or no S_r ^qelements in a row of T₁and T₂. The process ends at step 112, and the set S is partitioned into 2^ksubsets.
Consider a set S_r ^q={18, 11, 27, 4, 10, 16, 20, 14, 2} representing memory elements (addresses) at some partitioning stage. The object is to partition S_r ^qinto subsets such that upon completion of the final stage there are no two elements from the same set in the same row of tables T₁and T₂(FIG. 2). FIG. 3 illustrates the process of partitioning, which includes a first step 120 that constructs two sets of edges, one set per table. The second step 122 links the constructed edges into lists, which are then used in the final step 124 to produce two subsets S_r+1 ^2aand S_r+1 ^2q+1for each table.
At step 120, the edges are constructed by connecting two adjacent points in each row. As used herein, the term “point” refers to corresponding numbers in the input set. If the row contains an odd number of points, the remaining point is connected with next remaining point from the next row that also has odd number of elements. If, after all rows are processed, there is still a point without a pair, that point is left unconnected. For the example of FIG. 2, the two edge sets are
E ₁={(18,11), (27,4), (10,16), (20,14) } and
E ₂={(27,16), (20,4), (10,2), (14,18)}.
Points 2 in T₁and 11 in T₂are unconnected.
At step 122, the edges and points identified in step 120 are linked into lists. Each list starts at a point and ends at the same or different point. This step starts at any point from the set being divided, and looks alternately in tables T₁and T₂for list elements. For purposes of illustration, assume the starting point is point 18 and table T₁in FIG. 2. Edge (18,11) is the first in the list. Next, a point (if it exists) is found in table T₂that is connected to the end of edge (18,11). In this case point 11 is not connected to any other point in table T₂, so point 18, from the start of the edge is considered. In this case, table T₂identifies that point 14 is connected in an edge with point 18. Because the edge (14,18) found in table T₂is connected to the first point (18) of edge (18,11), the direction of movement through the list is reversed and edge (14,18) is added to the trailing end. Next the process looks for a point in table T₁connected to the end (point 14) of list in the direction of movement. Because point 14 is edged with point 20 in table T₁, point 20 is the next point of the list. The process continues until the second end of the list (point 2) is reached. If, at the end of the list, all points from the set S_r ^qare included in the linking, the linking operation is finished. If there are points that do not belong to any list, a new list is started. In the example of FIG. 2, all points are in one list. There may be any number of lists and there may be none or one “isolated” (unconnected) point.
After completing the linkages of step 122, the points are identified as odd or even, starting from any point. The starting point and all points separated by an odd number of points from the starting point (all even points) are inserted into S_r+1 ^2q. All other points (all odd points) are inserted into S_r+1 ^2q+1. For example, the points can be indexed with 0 and 1 so that neighboring points have different indices. Thus, all points with a “0” index are inserted into one set (S_r+1 ^2q) and all points with a “1” index are in the other set (S_r+1 ^2q+1) . In the example of FIG. 2, starting indexing at point 11, the result of this dividing are sets: S_r+1 ^2q={11,14,4,16,2} and S_r+ ^2q+1={18,20,27,10}. Sets S_r+1 ^2qand S_r+1 ^2q+1are further partitioned until k=r and no row contains more than one element from the original set, S_r ^q.
The outputs of the process are function f matrix and two “positional” matrices, P₁and P₂, that identify the position of elements in starting tables (matrices) T₁and T₂. The four matrices P₁, P₂, T₁and T₂allow necessary parallelism in data reading. Function f is represented in the form of a matrix whose column indices are its values and column elements are numbers from the input set which have that value. Thus, in FIG. 5 each column of matrix f contains addresses from one memory. The positional matrices P₁and P₂have the same dimensions as matrices T₁and T2, namely m×n. For each position (i,j) in a matrix T₁or T₂, the corresponding position in the corresponding matrix P₁or P₂identifies a position of the corresponding element, T₁[i] [j] or T₂[i] [j], in matrix f. For example, in FIG. 5 element T₁[2] [1]=5 in matrix T₁identifies a position (i,j) in positional matrix P₁of element P₁[2] [1]. Element P₁[2] [1] identifies the row and column coordinates (1,5) of element T₁[2] [1]=5 in matrix f. In matrix T₂, element T₂[5] [4]=5 identifies positional element P₂[5] [4] which identifies the coordinates (1,5) in matrix f of T₂[5] [4]=5. Similarly, in matrix T₂, element T₂[2] [1] identifies the (i,j) position in positional matrix P₂, which in turn identifies the row and column coordinates (4,7) of element T₂[2] [1]=15 in matrix f.
Decoding turbo codes is performed using the T1 and T2 matrices, together with the P1 and P2 positional matrices, by accessing one of the T1 or T2 matrices during each parallel processing stage, and, using the corresponding positional matrix P1 or P2, to identify the address in the function matrix, where each column of the function matrix represents a different memory in the system of memories. For example, if a parallel operation required data from the third row of matrix T1 (addresses 21, 5, 1, 19, 34), matrix T1 would identify coordinates (2,0), (2,1), (2,2), (2,3) and (2,4), pointing to corresponding coordinates in matrix P1 where coordinates (1,3), (1,5), (1,6), (1,1) and (1,2) are stored. These are the coordinates of required addresses in function matrix f and each is placed in different columns (memories).
Although the present invention has been described with reference to preferred embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.

Claims

1. A decomposer for decomposing at least a set of parallel concatenated convolutional codes representing addresses in a plurality of memories, the set of codes being arranged at coordinates in a function matrix table, the decomposer comprising:

first and second matrix tables, each having m rows and n columns defining coordinates, each of the first and second tables containing the codes at coordinates such that each row contains no more than one code of a respective group of the codes; and

third and fourth matrix tables, each having m rows and n columns, and containing entries identifying coordinates in the function matrix table and arranged so that each entry in the third matrix table is at coordinates that match coordinates in the first matrix table containing the corresponding code, and each entry in the fourth matrix table is at coordinates that match coordinates in the second matrix table containing the corresponding code.

2. The decomposer of claim 1, wherein the set of codes in the function matrix table represents a function f:{0, 1, 2, 3, . . . , n*m−1)→f:{0, 1, 2, 3, . . . , 2^k−1}.

3. The decomposer of claim 1, further including:

an organizer for organizing the addresses into first and second sets, S_r ^q, each containing the addresses, and

a sorter for sorting the first set of addresses into the first matrix table and sorting the second set of addresses into the second matrix table.

4. The decomposer of claim 3, wherein the organizer includes:

an edge identifier for identifying a plurality of edges between the addresses such that each edge contains two addresses, and each address is unconnected or in not more than two edges,

a linker for linking the edges into a sequence, and

an assignor for alternately assigning edges to first and second sets.

5. The decomposer of claim 4, wherein the sorter includes, for each set:

a divider for dividing each set, S_r ^q, of addresses into first and second subsets S_r+1 ^2qand S_r+1 ^2q+1,

placer apparatus for placing the first and second subsets into respective rows of the respective first and second matrix table, and

iteration apparatus for iteratively repeating operation of the divider and placer until each row contains no more than one address of a location in each respective memory.

6. The decomposer of claim 3, wherein the sorter includes, for each set:

7. The decomposer of claim 1 wherein the first, second, third and fourth matrix tables are implemented at least in part in at least one integrated circuit.

8. A decomposer for decomposing at least a set of parallel concatenated convolutional codes representing addresses in a plurality of memories, the set of codes being arranged at coordinates in a function matrix implemented at least in part in at least one integrated circuit, the decomposer comprising:

first and second matrices implemented at least in part in at least one integrated circuit, each of the first and second matrices having m rows and n columns defining coordinates, each of the first and second tables containing the codes at coordinates such that each row contains no more than one code of a respective group of the codes; and

third and fourth matrices implemented at least in part in at least one integrated circuit, each of the third and fourth matrices having m rows and n columns, and containing entries identifying coordinates in the function matrix and arranged so that each entry in the third matrix is at coordinates that match coordinates in the first matrix containing the corresponding code, and each entry in the fourth matrix is at coordinates that match coordinates in the second matrix containing the corresponding code.

9. The decomposer of claim 1, wherein the set of codes in the function matrix represents a function f:{0, 1, 2, 3, . . . , n*m−1)→f:{0, 1, 2, 3, . . . , 2^k−1}.

10. The decomposer of claim 8, further including:

a sorter for sorting the first set of addresses into the first matrix and sorting the second set of addresses into the second matrix.

11. The decomposer of claim 10, wherein the organizer includes:

a linker for linking the edges into a sequence, and

an assignor for alternately assigning edges to first and second sets.

12. The decomposer of claim 11, wherein the sorter includes, for each set:

placer apparatus for placing the first and second subsets into respective rows of the respective first and second matrix, and

13. The decomposer of claim 10, wherein the sorter includes, for each set: