GB2463011A - Interleaved decoding using a selectable number of parallel decoding units interconnected with RAM units - Google Patents

Interleaved decoding using a selectable number of parallel decoding units interconnected with RAM units Download PDF

Info

Publication number
GB2463011A
GB2463011A GB0815531A GB0815531A GB2463011A GB 2463011 A GB2463011 A GB 2463011A GB 0815531 A GB0815531 A GB 0815531A GB 0815531 A GB0815531 A GB 0815531A GB 2463011 A GB2463011 A GB 2463011A
Authority
GB
United Kingdom
Prior art keywords
decoder
ram
operable
units
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0815531A
Other versions
GB2463011B (en
GB0815531D0 (en
Inventor
Imran Ahmed
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Europe Ltd
Original Assignee
Toshiba Research Europe Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Research Europe Ltd filed Critical Toshiba Research Europe Ltd
Priority to GB0815531A priority Critical patent/GB2463011B/en
Publication of GB0815531D0 publication Critical patent/GB0815531D0/en
Publication of GB2463011A publication Critical patent/GB2463011A/en
Application granted granted Critical
Publication of GB2463011B publication Critical patent/GB2463011B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/27Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes using interleaving techniques
    • H03M13/2739Permutation polynomial interleaver, e.g. quadratic permutation polynomial [QPP] interleaver and quadratic congruence interleaver
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/29Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes
    • H03M13/2957Turbo codes and decoding
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/29Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes
    • H03M13/2957Turbo codes and decoding
    • H03M13/2978Particular arrangement of the component decoders
    • H03M13/2981Particular arrangement of the component decoders using as many component decoders as component codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/37Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
    • H03M13/39Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes
    • H03M13/3972Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes using sliding window techniques or parallel windows
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/65Purpose and implementation aspects
    • H03M13/6566Implementations concerning memory access contentions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0045Arrangements at the receiver end
    • H04L1/0047Decoding adapted to other signal detection operation
    • H04L1/005Iterative decoding, including iteration between signal detection and decoding operation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0045Arrangements at the receiver end
    • H04L1/0047Decoding adapted to other signal detection operation
    • H04L1/005Iterative decoding, including iteration between signal detection and decoding operation
    • H04L1/0051Stopping criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0045Arrangements at the receiver end
    • H04L1/0052Realisations of complexity reduction techniques, e.g. pipelining or use of look-up tables
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0064Concatenated codes
    • H04L1/0066Parallel concatenated codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0071Use of interleaving

Abstract

A data decoder device operable to perform interleaved decoding operations. The device has an array of decoder units (5A-5P) operable to provide a selectable number of decoder units for use in parallel, has RAM addresses in RAM units (6A-6P) and has an interconnect (7,8) operable to provide interconnection between the decoder units and the RAM addresses. Interconnection addresses are provided by an addressing unit. The addressing unit is operable to perform barrel shift operations on a data set representing decoder units relative to datasets representing RAM addresses to provide updated interconnection addresses, thereby avoiding RAM contention when performing decoding. In a specific embodiment the device is operable to perform turbo decoding.

Description

A DATA DECODING DEVICE AND METHOD
The present invention relates to a method of, and apparatus for, the decoding of data signals. The invention has particular application to data decoding involving multiple parallel decoder units accessing a random access memory (RAM) through an interleaver. Turbo decoding is an example of this type of data decoding.
In digital communications, errors are introduced in noisy channels. Forward error correction (FEC) is often applied to mitigate these errors. FEC involves coding and decoding of transmitted data.
One known FEC decoding scheme is turbo decoding. Turbo decoding achieves a high degree of error correction, close to the accepted Shannon limit for performance.
Turbo decoding applies a principle of probabilistic decoding and uses soft-input soft-output (SISO) decoders working iteratively through an interleaver. A limitation of conventional turbo decoders arises from a high decoding latency in interleaving and de-interleaving operations between the decoder stages. This latency adversely affects the rate at which data can be decoded.
The rate at which data can be decoded can be improved by providing multiple decoder elements in parallel. Examples of this approach are given in: Jing Sun, Takeshita, O.Y. "lnterleavers for turbo codes using permutation polynomial over integer rings" IEEE transaction on information theory, Jan 2005, pp 101-119; C. Berrou, A. Glavieux and P. Thitimajshima, "Near Shannon Limit Error-Correcting Coding and Decoding: Turbo Codes". Proc ICC, Geneva, Switzerland, 1993, pp. 1064-1070; J. Hagenauer and P. Hoecher, "A Viterbi algorithm with soft-decision outputs and its applications", Proc. Of Globecom 89, Dallas, Texas, pp. 47.11-47.17, Nov. 1989; and P. Robertson, E. Villerbrun and P. Hoeher, "A comparison of optimal and sub-optimal MAP decoding algorithm operating in the log domain," in proc. ICC'95, pp. 1009-1013.
In these examples, a trellis or sectionalized trellis representation is used. Here, a trellis representation has notion of time traversing from left to right. Each time step in trellis is traversed (bit by bit) with incoming data. The trellis of these examples is divided into sub-trellises whereby multiple processes can be used in parallel on each sub-trellis.
An input frame N is divided into M segments and parallel SISO decoders are assigned to perform the same algorithm on respective M segments.
Each parallel SISO decoder requires access to RAM addresses corresponding to their respective M segment. If all parallel SISO decoders access the same RAM at the same time instants across the M segment boundaries, collisions may occur. Therefore, implementing effective turbo decoding requires an efficient means to avoid these collisions, or provide contention-free access to RAM.
An algebraic solution to the problem of providing contention free RAM access in parallel turbo decoding is provided in Oscar Y. Takeshita, "On Maximum Contention-Free Interleavers and Permutation Polynomials Over Integer Rings", IEEE Transaction On Information theory, Mar 2006 pp 1249-1253.
The solution provided in Takeshita is based on a mathematical description given earlier in Costello, D.J., et al "Contention-free interleavers", In IEEE proceedings ISIT 2004, 27 June -2 July 2004 pp 54. In that document, it is suggested that an interleaver fl(i), 0 �=i < N is contention free for segment size M if for both p = Hand = if1 According to Takeshita, an interleaver f(x), 0 �=x < N is contention free for M segments ofsizeW each, if for both p=fand Jt=f and [ (j+tW)/W] !=[.t (j + vW)/W], whereOis�=j<W,0�=t,v�=Mandt!=v.
One interleaver operation, known for general application in data processing, is the Quadratic Polynomial Permutation (QPP) interleaver. As discussed in the same document by Takeshita, a QPP interleaver is specified by the quadratic expression: f(x)=f1 x+f1 x2mod(N), O�=x�=N.
An interleaver is maximally contention free if all window sizes W, which divide N, are contention free. Maximal contention free interleaving is also discussed in Costello D.J.
et al., "Contention-free interleavers," IEEE proceedings ISIT 2004, 27 June -2 July 2004 pp. 54.
The same document establishes that QPP interleavers are maximally contention-free and are fully specified by the quadratic expression: f(x) = (f1x + f2x2) (mod N), 0 �=x < N The parameters f1 and f2 can be specified for 3 Generation Partnership Project (3GPP) Long Term Evolution (LTE) frames of size N by reference to the publication www.3GPP.org, "Technical Specification Group Radio Access Network": "Multiplexing and channel coding" Release 8, 3GPP TS 36.212, and "QPP interleaver design for LTE", 3GPP TSG RAN WG1 47bis, Tech. Rep. RI -070060, Jan 2007.
The performance of QPP interleavers have been shown to be better than S-Random Interleavers by Motorola in their document, "Performance of contention free interleavers for LTE turbo codes," 3GPP TSG RAN WG1 47b1s, Tech Rep. R1-070055, Jan 2007.
Turbo decoders are documented is various communication standards, such as 3GPP UMTS. Due to the high performance of turbo decoding, it is desirable to apply it to higher data rate standards, such as in the Gb/s wireless domain.
A number of known turbo decoder devices and methods are disclosed in: Bougard, B. et at., "A Scalable 8.7nJ/Bit 75.6Mb!s parallel concatenated convolutional turbo codec," in IEEE mt. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2003, pp. 152-1 53; Bickerstaff, B. et al., "A 24Mb/s radix-4 Log Map turbo decoder for 3GPP-HSDPA mobile wireless, "in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2003, pp. 150-151; Thul, Michael J. et al., "A scalable system architecture for high-throughput turbo decoders" in IEEE Workshop on Signal Processing Systems, 2002, pp 152-158; Giulietti, A. et al., "A 80 Mb/s Low-power Scalable Turbo Codec Core" in Custom mt Circuits Conf., 2002, pp 389-392; and Dobkin, R. et al., "Parallel VLSI Architecture for Map Turbo Decoder" in 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, 2002, pp 384-388.
Conventional turbo decoders can be characterized by a Very Large Scale Integration (VLSI) array of decoder units. Conventional decoder array structures involve all of the decoder units irrespective of the rate of data to be decoded. Therefore, the power consumption of these conventional decoders is determined by the maximum rate of data to be decoded.
A challenge in effectively implementing turbo decoding lies in adapting the power consumption of devices which need high data rate capability, but may often work at relatively low data rates.
An aspect of the present invention provides a turbo decoder which uses an array of SISO decoder clusters, wherein the array is reconfigurable so that the number of SISO clusters used in parallel can be selected, the array using QPP interleaving and providing contention free RAM access addressing using by a methodology based on an observation that RAM access addressing may be represented by ring patterns if QPP interleaving is used, if appropriate data frame sizes are processed, and if appropriate numbers of decoder clusters are used in parallel. The array may be a VLSi array. This aspect provides a decoder with maximally contention free interleaving, which benefits from using parallel SISO decoder units, and which can be adapted for various data rates. This aspect allows RAM access addressing to be updated with each iteration or half iteration of an interleaved SISO decoder operation.
In one aspect the invention is a data decoding device operable to perform interleaved decoding operations comprising: an array of decoder units, the array operable to provide a selected number of decoder units for use in parallel; one or more random access memory (RAM) units; an interconnect operable to provide access to the one or more RAM units for the selected number of decoder units; an addressing unit operable to provide access addresses for the interconnection of the selected number of decoder units and RAM addresses, wherein an access address relates a decoder unit to a RAM address, and wherein the addressing unit is operable to perform barrel shift operations to shift a data set representing each of the selected number of decoder units relative to a data set representing a corresponding number of RAM addresses to provide updated interconnection addresses.
The device may be operable to perform Quadratic Polynomial Permutation interleaved decoding operations.
The selected number of decoder units may be constrained to a defined set of numbers, wherein the defined set corresponds to a size of frame for data to be decoded. The set may provide contention free RAM access address patterns for QPP interleaved operations. These patterns may be circular allowing barrel shift operations to be used to provide rapidly updated RAM access addressing. These updated RAM access addresses may be carried out with each new iteration of the decoding operations.
The defined set of numbers may comprise the numbers 1, 2, 4, 8 and 16. This may be for frame sizes which are multiples of 2048 bits.
The device may be operable to select a number of decoder units in parallel according to a rate of data to be decoded.
The device may be operable to perform iterations of the interleaved decoder operations and the addressing unit may operable to perform one or more barrel shift operations per iteration.
The device may be operable to perform iterations of the interleaved decoder operations having two phases and the addressing unit may be operable to perform one or more barrel shift operations each phase.
The addressing unit may be extrinsic, or placed external, to the array.
The RAM addresses may correspond to a RAM which is extrinsic, or RAM which is placed external to the array.
The array may comprise an integrated circuit.
The integrated circuit may comprise a very-large-scale integration integrated circuit.
The decoder units may comprise soft-in soft-out decoder units.
The decoder units may be operable to perform maximum likelihood decoder operations.
The device maybe operable to perform turbo data decoding operations.
A configuration controller may be provided, operable to provide a state machine to select the number of decoder units in parallel.
The configuration controller may be extrinsic, or placed external, to the array.
The interconnect may be implemented by way of an integrated circuit.
In another aspect the invention comprises a set of processor readable instructions, the instructions being operable to cause a processor to be configured as the device recited above.
In another aspect the invention comprises a method of performing interleaved data decoding operations using a selected number of decoder units in parallel and using one or more RAM addresses with iterative RAM access operations, wherein the method further comprises: storing a first data set representing each selected decoder unit; storing a second data set representing a corresponding number of RAM addresses, performing one or more barrel shift operations to shift the first data set relative to the second data set; performing one or more RAM access operations according to interconnection addresses provided by the relative alignment of the first and second data sets.
The data decoding operations may be QPP interleaved.
In a further aspect the invention comprises a data decoder device operable to perform interleaved decoder operations, the device comprising: an array of decoder units, operable to provide a selected number of decoder units for use in parallel; one or more random access memory (RAM) addresses; an interconnect operable to provide access to the one or more RAM addresses for the decoder units; an addressing unit operable to provide contention free access addresses for the interconnection of the selected number of decoder units and RAM addresses, wherein an access address relates a decoder unit to a RAM address; a controller operable to select the number of decoder units according to characteristics of data to be decoded and operable to control the interconnect to provide interleaved access to the RAM for selected numbers of decoder units.
The selected number of decoder units may be constrained to a defined set of numbers, the defined set chosen for a size of frame for data to be decoded. The interleaved operations may be QPP interleaved. The selected number of decoder units may be constrained to provide circular address patterns.
The defined set of numbers may comprise the numbers 1, 2, 4, 8 and 16. This may be for frame sizes which are multiples of 2048 bits.
The number of decoder units for use in parallel may be selected according to a rate of data to be decoded.
The device may be operable to perform iterations of the interleaved decoder operations and the addressing unit is operable to perform a barrel shift operation each iteration.
A configuration controller may be provided in the device as recited above in this aspect of the invention, the controller being operable to select the number of decoder units in parallel.
The configuration controller may comprise a separate processor which may be as simple as a state machine.
The configuration controller may be extrinsic, or placed external, to the array.
The interconnect may comprise an integrated circuit.
In another aspect the invention provides a method of decoding data comprising performing decoding operations using a selected number of decoder units in parallel and one or more RAM addresses using iterative RAM access operations, wherein RAM access operations are interleaved, and wherein the method further comprises: storing a first data set having elements representing each selected decoder unit; storing a second data set having elements representing each RAM address; performing one or more barrel shift operations to shift the first data set relative to the second data set; and performing one or more RAM access operations according to interconnection addresses provided by the relative positions elements of the first and second data sets.
In another aspect the invention provides a data decoder device operable to perform interleaved decoder operations, the device comprising: an array of decoder units, operable to provide a selected number of parallel decoder units for use in parallel; one or more random access memory (RAM) addresses; an interconnect operable to provide access to the one or more RAM addresses for decoder units; an addressing unit operable to provide contention free access addresses for the interconnection of the selected number of decoder units and RAM addresses, wherein an access address relates a decoder unit to a RAM address; a controller operable to select the number of decoder units according to characteristics of data to be decoded and operable to control the interconnect to provide interleaved access to the RAM for selected number of decoder units.
The controller may comprise a state machine, the state machine may be extrinsic to the array.
In another aspect the invention provides a computer program product bearing computer executable instructions which, when executed in a computer, will cause the computer to configure a configurable device to provide the device defined above.
Further aspects, advantages and features of the invention may become apparent to the reader from the following description of specific embodiments thereof, with reference to the accompanying drawings, in which: Figure 1 a depicts a data decoder device and according to a specific embodiment of the present invention; Figure lb depicts a tn-state buffer used in the data decoder device in Figure Ia; Figure lc depicts an element of a reconfigurable interconnect used in the data decoder device in Figure la; Figure 1 d depicts an element of an S-box used in the data decoder device in Figure 1 a; Figure le depicts an element of a C-box used in the data decoder device in Figure la; Figure 2 depicts a decoder unit included in the data decoder device illustrated in Figure 1 a; Figure 3 depicts a state machine included in the data decoder device illustrated in Figure la; Figure 4 depicts an interleaving-deinterieaving process used by the data decoder device illustrated in Figure la; Figure 5 depicts a state machine diagram for a unified state machine included in a data decoder device illustrated in Figure Ia; Figure 6 depicts Add Compare Select (ACS) operation used by the data decoder device illustrated in Figure la; Figure 7 depicts log-likelihood-ratio calculation in a decoder unit used by the data decoder device illustrated in Figure la; Figure 8 depicts an interleaver included in the data decoder device illustrated in Figure la; Figure 9 depicts RAM address configurations used by the data decoder device illustrated in Figure la; Figure 10 depicts barrel interconnection addressing used by the data decoder device illustrated in Figure la; Figure 11 depicts interconnection of decoder units and RAM addresses of the data decoder device illustrated in Figure la; Figure 12 depicts a barrel shifter for a specific data frame size, the barrel shifter unit being included in the data decoder device illustrated in Figure Ia; Figure 13 depicts the percentage of the area of a VLSI implementation of components of a data decoder device according to a specific embodiment of the present invention similar to the data decoder device illustrated in Figure 1 a; Figure 14 depicts power consumption of VLSI components of a data decoder device according to a specific embodiment of the present invention, the power consumption being shown as a percentage of the total power consumption of the decoder unit; Figure 15 depicts the relationship between the configuration and the achievable rate of data decoding of the data decoder device illustrated in Figure Ia; Figure 16 depicts the relationship between the rate of data decoding and frame size for various configurations of the data decoder device illustrated in Figure la; Figure 17 depicts decoding data rates for different frame sizes for a particular configuration of the data decoder device illustrated in Figure la, which utilises 16 decoder units in parallel.
Figure la is a schematic diagram of a reconfigurable array data decoder device I for parallel turbo decoding using Quadratic Polynomial Permutation (QPP) interleaving according to a specific embodiment of the present invention. An array 2 is shown to communicate with the external barrel shifter unit 3 and an external configuration controller 4.
In this specific embodiment the configuration controller 4 is a Unified State Machine (USM) 4. The USM 4 is made common to all decoder units 5 and RAM units 6. The USM 4 is placed externally to the array with direct connection to the decoder 5 and RAM 6 clusters.
QPP interleaver addressing allows for all parallel decoder units 5 in this embodiment to access parallel RAM units 6 in a contention-free fashion. Although the access pattern is contention free, due to the inherent property of QPP interleaving, effective reconfiguration needs to be fast enough to deal with a decoder unit 5 generating an interleaved address on every clock cycle. The reconfigurable resource needs to provide the correct interconnect pattern each cycle. Also, the array 2, shown in Figure 1, has a flexibility to use 1, 2, 4, 8 or 16 decoder blocks while the RAM unit 6 can also be divided in sizes from 2Kx4 to 8Kxl. Providing this is non-trivial and would conventionally require large reconfigurable hardware resources to be dedicated for desired high-speed interconnection. These resources would conventionally introduce power, area and data-rate penalties.
Also according to this specific embodiment, the array is implemented on a VLSI integrated circuit. Various cell library implementations will be known to the skilled reader and any suitable known library may be used. However, this specific embodiment uses a 9Onm Toshiba standard cell library and occupies an area of 11.11 mm2.
The array 2 has a number of decoder units 5A to 5P, in the form of 16 parallel SISO units, clusters or modules. The decoder units 5A-5P communicate with random access memory (RAM) units 6A-6P. It will be understood by the reader that the RAM units 6A- 6P may be RAM units within a single extrinsic RAM device. In the case of this specific embodiment, these RAM c'usters are extrinsic to the array 2.
A set of S boxes 7A-7P and C boxes 8A-8P provide dynamic and static reconfigurable interconnection between the decoder units 5 and the RAM units 6.
For static reconfiguration of the array, the reconfiguration bits are passed during the start-up. The main element used in the reconfigurable interconnects is a tn-state buffer as shown in Figure 1 b. One bit is required for each configurable switch, to operate the switch. A bidirectional switch is implemented using two unidirectional switches (not shown).
A static interconnect in the array 2 consists of a collection 202 of reconfigurable switches in switch boxes (S boxes) 7 and connection boxes (c boxes) 8 as shown in figure Ic.
A switch block 204 is a programmable interconnect placed at the intersection of each horizontal and vertical routing channel. Figure Id shows a simple switch box.
A connection box 8, depicted in figure le, connects pins 205 of the logic block 206 to tracks 207 via configurable switches 208. Usually one pin is assigned to one track, however, this can be reduced or increased to change the flexibility of the array.
The S boxes 7 and C boxes 8 allow the array 2 to be reconfigured so that the number of decoder units 5 that are used in parallel can be selected. In this specific embodiment, the parallel decoder units are used to provide turbo decoding according to a parallel concatenated convolution code (PCCC) as discussed in the document by Oscar Y. Takeshita, "On Maximum Contention-Free Interleavers and Permutation Polynomials Over Integer Rings", IEEE Transaction On Information theory, Mar 2006 pp 1249-1253.
In this form, a turbo decoder consists of two SISO decoders concatenated through an intenleaver-deinterleaver structure. As known to the skilled reader, the use of the interleaver-deinterleaver structure allows low-weight code words produced by a single encoder to be transformed into high-weight code words for the overall encoder. This improves the error correcting capability of the decoder. In the case of the specific embodiment depicted in Figure la, the decoder units 5 are SISO Maximum Likelihood (ML) decoder clusters.
The configuration controller, illustrated as a unified state machine (USM) 4, is a single Finite State Machine (FSM). It communicates in common with all decoder units 5 and RAM units 6. Although described in more detail with reference to Figure 3, one of the operations of the configuration controller is to control the configuration of the array 2.
The USM 4 is external to the array 2 but has direct connections to RAM clusters and Max Likelihood decoder blocks in Figure 1. Direct connections avoid using the reconfigurable interconnects, thus decreasing routing complexity.
Figure 2 depicts a device functional architecture for a decoder unit 5. Internally, the decoder unit 5 is a ML cluster which performs Soft-Input Soft-Output (SISO) maximum log MAP decoding.
The decoder unit 5 has the following components: input RAM hA-liD branch metric blocks 22b; forward processor blocks 13; reverse processor 22a and reverse processor acquisition blocks 14; a forward processor RAM 15; a log likelihood ratio calculation block 17; interleaver-deinterleaver address unit 18; input LIFO RAMs 19A and 19B and output FIFO RAMs 20A and 20B; and a contention free QPP interleaver 21.
Also depicted in Figure 2 are the extrinsic RAM 6 and the extrinsic USM 4. The operations of the decoder unit 5 are controlled by the USM 4. The USM 4 is made of a number of finite state machines FSMs which each control an individual decoder unit 5.
The USM 4 controls a number of decoder units 5A-5P used in parallel. Reverse processor dummy 22a and reverse processor 22b shown in Figure 2 are two identical blocks. Each of these blocks consist of 8 ACS components. The Read Counter 14 is a binary up-counter that provides the non interleaved address to the extrinsic RAM. The arrow 23 depicts a data input communicating with the forward processor RAM 15.
The data decoding device 2 and decoder unit 5, within that, implement a windowed max log MAP algorithm as described in Viterbi. In a simplest form this algorithm uses two reverse processors in parallel with one forward processor. The reverse processors are shown as the reverse processor dummy 22a and the reverse processor 22b of Figure 2. The forward processor is shown as the forward processor 23A-23G.
Figure 3 depicts, in greater detail, the unified state machine depicted in Figure la. As shown in Figure 3, the USM 4 includes three interacting state machines.
First is an "iterations control" state machine 25 which implements the flow of the iterative max log MAP decoding algorithm. The state machine has the states: "idle" 26, which represents the end of decoding iterations, "SISO1" 27 representing the state when SISO 1 is decoding and SISO 2 is idle as depicted in Figure 4. "SISO 2" 28 represents SISO 2 decoding and SISO I idle, and an "end of iterations" decision 29 which if true (i.e. the number of iterations are completed) returns the state either "idle" 26 or "SISO 1" 27 if iterations are not completed. During the decoding operation the state machine flows from "SISO I" to "SISO 2" and from "SISO 2" to "SISO 1" unless the end of iterations has been reached at which point the state machine flow moves to "IDLE" 26.
A second of the three interacting state modules of the USM 4 is a "windows control" state machine 30, which controls a windowing algorithm, as known to the reader of SISO decoding shown in Figure 5. This windows control state machine 30 has the states: "start" 31, "state 0-L" 32, "state L-2L" 33, "state 2L-3L" 34, "state 3L-4L" 35 and an "end of processing" decision 36. If end of processing has not completed the state machine control keeps repeating between "34" and "35". The state machine 30 can move from one state 31-35 to the next state in that order. The state machine 30 can also remain in the same state from one time instant to the next. The "end of processing" state returns the state machine 30 to either "state 2L-3L" 34 or to "start" 31.
The windows control state machine 30 communicates with a third state machine, the "memory control" state machine 37. That state machine 37 enables and disables the input RAM 11 and extrinsic RAM 6. This state machine 37 has the states: start" 38; "start input LIFO" 39; start windows control" 40; "start output LIFO" 40; "start output LIFO" 41; and "write enable OP RAM" 42. The state machine 37 can transition from one state 38-42 to the next in that order. It can also remain in any one of states 39-42.
The state machine 37 can move from the "write enable OP RAM" 42 to "start" 38.
Referring again to Figure 2, a process of initializing results from the reverse processor 13 involves a reverse processor dummy 22 used after a warm-up period. The branch metric calculation blocks 12A-12C calculate the branch metrics independently for the forward processor 13, 23A-23G, reverse processor 13 and reverse processor dummy 22. The branch metrics calculators 12A-12C receive input metrics from input RAMs 1 lA-i 1 D via multiplexers 24A-24D.
The input RAMs hA-liD, which are also labelled input RAM I and input RAM "APRIORI", store input metrics for two window lengths (WL). A two WL architecture employs two memory banks', RAMs hA and 11B. Each RAM saves one WL of input message, 11 C and liD each stores an apriori message corresponding to the input messages in 1 1A and 11 B, each storing one WL for the input metrics. The input RAMs 11 can be read from and written to either a forward or reverse direction, as controlled by the windows control state machine 30 shown in Figure 3.
Figure 4 depicts an interleaving-deinterleaving process using a single interleaver (not shown). The process involves an input RAM 11, an extrinsic RAM 6 and a SISO decoder unit 5.
The process 40 has two phases, a first half iteration and a second half iteration. In the first half iteration, non-interleaved data is read from the input RAM 11 to the decoder unit 5. In the same phase, non-interleaved data is written from the decoder unit 5 to the extrinsic RAM 6. Also in the first half iteration, non-interleaved data is read from the extrinsic RAM 6 to the decoder unit 5. In the second half iteration, interleaved data is read from the input RAM 11 to the decoder unit 5, interleaved data is written to the extrinsic RAM 6 and interleaved data is read from the extrinsic RAM 6.
Figure 5 depicts the process controlled by the windows control state machine 30, depicted in Figure 3. This process is for a single decoder unit, or SISO 5. The vertical axis 50 depicts time, divided into four slots: 0-L 52, L-2L 53, 2L-3L 54 and 3L-4L 55.
The horizontal axis 51 represents a data frame, divided into 0-L 56, L-2L 57, 2L-3L 58 and 3L-4L 59.
In the time slot 0-L 52 input metrics corresponding to the first WL, 0-L 56, are written into input RAM 11.
In the time slot L-2L 53 input metrics corresponding to the second WL L-2L 57 are written into RAM 2 11 B. The reverse processor beta 22b simultaneously uses these values to calculate reverse state metrics. RAM I hA is read in the reverse direction by the forward processor 13 (23A-23G). This step calculates forward state metrics using forward state metrics RAMs (not shown).
After this latency of two WLs, log likelihood ratios (LLR) are provided at output of the LLR calculator 17. This occurs in the WL 2L-3L 58. The LLR calculator calculates the decoded bits by reading forward state metric RAMs in the forward direction.
RAM 1 hA is read in the forward direction to provide input metrics, corresponding to WL 0-L, for reverse processor 14 calculations. Forward processor calculations are performed in WL L-2L 57, which is done by reading the RAM 2 11 B in the reverse direction.
Calculated forward state metric values are also saved in the forward state metric RAM (not shown). In the next time slot 3L-.4L, the decoded output for WL L-L2 and the cycle repeats continuously. LIFO RAMs 19 provide the input metric in reverse order.
Branch metrics are required for each state and stage representative of the process depicted in Figures 3 and 5. Branch metrics are calculated using the Euclidian distance of soft input metrics. The decoding unit 5 has an independent branch metric calculation cluster for each reverse processor 14 and forward processor 13 and 23A- 23G. This is described in Viterbi, referenced above.
The forward and reverse processor calculation will now be described. The main kernel in traditional MAP (max log MAP) algorithm is the Add-Compare-Select (ACS) operation which is performed by each forward processor, reverse processor and reverse processor dummy block. Figure 6 schematically depicts ACS recursion or the calculation of survivor path metrics. Depicted is the normalization data path 69 which is required because the path metrics are accumulated within the block as they are recursively computed for Sliding Window ACS computation. ACS computation is shown by 65, 66, 67, 68, 69 and 70. ACS computation is well known to the skilled reader.
The ACS operation depicted in Figure 6 is a traditional ACS operation for max-log-mapping, as is known to the skilled reader. Inputs 61A and 61 B receive APRIORI data.
Inputs 62A-62D receive input symbol data. Branch metrics are computed using the adders in 63A and 630 and (two other adders) labelled in Figure 6 as BRANCH METRIC. The calculated branch metrics are routed by fixed interconnects 64A to the respective ACS units (for example, 23 G). The final state metrics output 72-73 are fed back to ACS units (23A-23G) after a delay of one clock cycle provided by 71 (D flip flop). The outputs of the adder 65 and 66, which calculate the two competing state metrics, are fed through a compare unit 67 and a select unit 68 which selects the "winning" metric. Normalization is performed on the "winning" metric if it threatens to overflow normaliser 69, and FSM multiplexer 70 selects the normalised metrics if normalization has been carried out. The output from each ACS unit ACSO-ACS7 is fed back through pipeline D flip-flops 71 to 64.
Figure 7 depicts a standard log-likelihood-ratio (LLR) calculator well known to the skilled reader. The LLR calculator 80 shown in Figure 7 requires values for forward and reverse state metrics, and branch metrics. The LLR calculator 80 consists of two identical blocks calculating the LLR of bit 0 and bit 1. The LLR calculation for each branch of trellis having output 1, for example, is the sum of branch metric forward state metric FSM, reverse state metric RSM and subsequently selection of the maximum value of all such branches.
Figure 8 schematically depicts a VLSI implementation of an interleaver-deinterleaver mechanism. A QPP interleaver 90 provides direct address calculation along with a read/write mechanism for each half iteration with reference to Figure 4. The numerals in italics depict example values rather than reference numerals. A read interleaved address is stored in a FIFO buffer 20A, 20B, depicted in Figure 2. The length of the buffer is determined by the latency of decoding, which corresponds to the difference between read and write for extrinsic RAMs 6. This mechanism allows a single VLSI interleaver to be used for both interleaving and deinterleaving. Contention free access is inherent in QPP interleaving and this is exploited with this mechanism using barrel shifters described with reference to Figures 11 and 12.
The interconnection of decoder units 5 and I/O RAM units, or clusters, 6 according to a specific embodiment of the present invention will now be described with reference to Figure 8. The RAM unit 6 is placed at the input and output of decoder units, specifically VLSI implemented SISO ML clusters 5. The interconnection of these is reconfigurable so that a selected number of decoder units 5 can be used in parallel. In this specific embodiment the number of decoder units 5 connected in parallel can be selected from a set comprising: 1, 2, 4, 8 and 16. This specific embodiment has sixteen decoder units in total.
Table 1 displays information for various configurations of the data decoder device 1 according to a specific embodiment of the invention. The table shows selected numbers of decoder units and RAM sizes in the first column, frame sizes in the second column, number of decoder units in parallel in the third column, data rate in the fourth column, and power consumption in milliwatts in the fifth column.
Table 1
Reconfiguration Frame Size No of Data Rate Power Context (No of SISO Mb/sec mW SISOs x Size of I/O Blocks RAMs) ________ ____ ________ ______ lx2K 2048 1 7.8 53.0 lx4K 4096 1 8.08 56.5 2x2K 4096 2 14.81 89.0 lx8k 8192 1 8.20 63.5 2x4K 8192 2 16.16 96.0 4x2K 8192 4 31.37 161 2x8K 16384 2 16.4 110 4x4K 16384 4 32.32 175 8x2K 15384 8 59.2 305 4x8K 32768 4 32.8 190 8x4K 32768 8 64.64 333 16x2K 32768 16 125 593 8x8K 65536 8 65.6 390 16x4K 65536 16 129.29 650 16x8K 128K 16 131.28 762 Table 1. Power and Speed results with varying parallelism and frame sizes.
As an example, the frame size 4096 can be decoded by either one decoder unit operating on the whole frame, or two decoder units each operating on half of a 4096 frame. Other examples will be apparent to the skilled reader from observations of
Table 1.
As shown by column 1 of Table 1, the I/O RAMs 6 of this specific embodiment can be reconfigured as: 1 x 8K RAM; 2 x 4K RAMs; or 4 X 2K RAMs.
Reconfiguration of the RAM is provided by the wrapper (not shown), which is internal to the RAM unit 6. Therefore, routing resources external to the RAM unit 6 are not used for the reconfiguration depicted in Table 1.
Figure 9 shows various configurations of an I/O RAM unit 6.
The mechanism for providing contention-free access for parallel decoder units 5 to RAM units 6 will now be described. In general, parallel decoding involves an input frame N being divided in to M segments and each decoder being assigned to perform the same algorithm on the respective M segment. In the specific embodiment depicted in Figure la, each decoder unit 5 has I/O RAM addresses 6A-6P corresponding to the respective M segment. The decoder units 5A-5P all access the RAM addresses 6A-6P at the same time. Therefore, as is well understood by the skilled reader, contention for RAM access needs to be coordinated to avoid collisions occurring when two or more decoder units 5 require access to the same RAM unit 6.
A ring-type access, or interconnection address pattern, is achieved for QPP interleaving by suitable selection of 1, 2, 4, 8 or 16 decoder units 5 and a suitable selection of the frame size. In two examples, a block size of 16K is decoded by 4 SISO units 5 in parallel and 8 SISO units in parallel respectively. These examples are depicted in Figure 10.
In Figure 10, the outer circle of each ring pattern depicts a data set representing each SISO, or decoder unit 5. The inner circle depicts a data set representing each extrinsic RAM address 6. The alignment of the data sets maps a decoder unit 5 to a RAM unit 6. Figure 10 depicts only representations of the decoder units 5 and the RAM units 6, but lower level interconnection addressing based on this principle will be apparent to the skilled reader.
The pattern formed by rings 90A and 91A and also 90C and 91C correspond to an interconnection access pattern for a data decoder operation using 4 decoder units 5 in parallel. The patterns represented by the rings 90B and 91B and also 90D and 91D represent an SISO operation using 8 decoder units 5 in parallel. Here, A corresponding 8 RAM addresses will be used.
As discussed herein, an interconnection access address for the 4 parallel decoder operation is depicted by the relative alignment of the ring sections. For example, the section 93A aligned with section 94A represents the decoder unit 1 accessing RAM unit 1. It will be apparent that decoder unit I and RAM unit I are simply representations to simplify illustration of the interconnection addressing methodology of this specific embodiment. If the section 93A is aligned with 94A then 93D will be aligned with 94B. This corresponds to the decoder unit 4 accessing RAM unit 2. It will be apparent that the sections 93 and 94 will provide contention-free interconnection addressing for each decoder unit based on an interconnection address for one decoder unit.
If 93A now aligns with 94D then 93D will align with 94A. This scenario is depicted by the rings 90D and 91 D. The scenarios represented by 90A and 90C correspond to a clockwise shift of the outer ring 90A by three segments. This shifting may be termed circular or barrel shifting.
This specific embodiment of the present invention has a barrel shifter unit 3, as depicted in Figure la. The barrel shifter 3 performs barrel shift operations on data sets, represented by the ring segments 93A-93D or 94A-94D. These operations adjust the relative alignment of these data sets and therefore provide updated contention-free interleaved interconnection addresses. As a barrel shift operation is relatively quick, these operations allow interconnection addresses to be generated each half iteration.
The operation of the barrel shifter unit 3, according to the specific embodiment, and its effect on interconnection of decoder units 5 with RAM units 6 will now be illustrated with reference to Figure 11. In Figure 11, decoder units 5 are depicted along with RAM units 6. An interconnect 100 is depicted between the decoder units 5 and the RAM units 6. The operation of the interconnect is defined in terms of data lines 104 and 107, read addresses 105 and 108 and write addresses 106 and 109.
The barrel shifter unit 3 has a number of frame-size specific barrel shifters 101-103.
The barrel shifter unit 101 is configured for 2K data segments, the barrel shifter 102 is configured for 4K data segments, and barrel shifter 103 is configured for 8K data segments. It will be understood by the skilled reader that the number and configuration of these barrel shifters 101-103 is determined by the number of decoder units 5 and RAM units 6 used in given configurations of the data decoder device 1.
A frame-size specific barrel shifter, such as 101-103 is depicted in Figure 12. Decoder (output/input) units 5A-5P output/input blocks of data D1-D16, depicted by boxes 1IOA-hOP. The same blocks of data, dl-d16, are represented by boxes lilA-hIP where they are input/output from RAM addresses 6A.-6P. During a barrel shift the alignment of D1-D16 are shifted relative to dl-d16 to, in effect, change the interconnection address between the SISO units 5A-5P and the RAM units 6A-6P. This represents a clockwise barrel shift by one address 1, for a data set of sixteen elements. Here the input of the barrel shift D16 is rotated to the location d16 which is connected directly to RAM unit 1. The interconnection addresses, which may also be referred to as data lines, for the rest of the decoder units 5 and RAM units 6 are automatically aligned by the same barrel shift.
Different reconfiguration options are shown in Table 1. For example, for 2K RAM segmentation 1, 2, 4, 8 or 16 decoder or SISO units 5 can be selected. The output of the selected SISO units 5, for each of these arrangements, is connected to one of the barrel shifter segments 101-103. The output from the frame size specific barrel shifter 101-103 is similarly connected to the RAM units 6A-6P by static reconfigurable components.
After an initial reconfiguration, which prepares the array 2 for a particular throughput, dynamic reconfiguration of decoder to RAM interconnection can take place per iteration or half iteration. This takes advantage of the speed of barrel shift operations and the observed circular RAM access addressing patterns that can be achieved with QPP interleaving. The dynamic configuration aligns the interconnect 100 for contention-free, high throughput RAM access for interleaved addresses. The initial reconfiguration also determines a power consumption corresponding to a particular rate of data decoding, or throughput. This allows the power consumption of the data decoder device 1 to be adapted for a given data rate. The reconfiguration allows fewer decoder units 5 to be selected for use in parallel when a lower throughput is required.
The barrel shifter unit 3, having frame-size specific barrel shifters 101-103, is implemented externally of the array with a static interconnect 100 that connects one of the shifters to the array as shown in Figure 11. The static connection depends on memory segmentation, such as 2K, 4K or 8K for example. This static interconnect 100 connects the next data and address line 104-106 to barrel shifters 101-103 and connects outputs from the barrel shifter 101 -1 03 to RAM units 6. Output from a barrel shifter is the barrel-shifted value of the input dependent on the interleaved address from the first SISO unit 5.
Figure 13 is a graph depicting the area occupied by individual components of a VLSI implemented data decoder array according to a specific embodiment of the present invention. The area given is a percentage of the total area of the array. Figure 13 shows the percentage areas of: a forward processor 131; a reverse processor 132; a reverse processor acquisition unit 133; a FSM RAM 134; a FIFO buffer-interleaver 135; an input RAM 136; a modulation interleaver 137; a multiplier (23x13) 138; an output RAM 139; a scratch pad RAM APRIORI A 140; a scratch pad RAM APRIORI B 141; a scratchpad RAM A-input matrix 142; and a scratchpad RAM B-input matrix 143.
Table 2 shows the area results of a 9Onm Toshiba TM Complementary Metal-Oxide-Semiconductor (CMOS) technology array.
Table 2
Area Results Library Used Tc300c_mg_worst_DC Number of ports 896 Number of nets 41072 Number of cells 14400 Number of 2464 references _______________________ Combinational 1662772.426 microns Area _______________________ Net Interconnect 4731150 area Total cell area 6386811.5 Total Area 11117961.5 microns __________________ (11.11 mm2) Figure 14 shows a graph of the percentage power consumption of individual VLSI components of a decoder array according to a specific embodiment of the present invention. Power consumption is given as a percentage of the total power consumption of the decoder array. Depicted are power consumptions of the components: sub for LLR 1 151; forward processor 152; reverse processor 153; reverse processor beta 154; branch metrics 3 155; branch metrics 2 156; branch metrics 3 157; input APRIORI scratchpad RAM 158; RAM LLRA 159; input metrics scratchpad A 160; input metrics scratchpad B 161; input RAM 162; output RAM 163; FIFO buffer with interleaver 164; modulator-interleaver 165; multF2_l_square-interleaver 165; and FSM RAM 166.
Table 3 shows a summary of specifications and results of a data decoder device according to a specific embodiment of the present invention.
Table 3
Technology 0.9 microns standard cell CMOS (TC300C) Code rate 1/2, 1/3, 1A1/5 Constraint length 9 (256 states) Generator 3GPP LIE polynomial Window length 32 (flexible) Decision level 4 bit soft decision ACS units 8 Powersupply 1.1V Operating 100 MHz frequency Average Power 53.0 -762 mW (data rate dependant) Total Area 11.1 mm2 Data Rate 7.8 -131.28 Mb/sec (flexible) Interleaver Contention free -QPP (Memory less) Block Size Up to 128 K Table 4 shows specifications of a data decoder device according to a specific embodiment of the present invention.
Table 4
_________ TRL-Ours Throughput 131.2Mb/s Mb/sec ___________________ Area(mm2) 11.1 Interleaver QPP Contention free Non SRAM-hardware ____________ Implemented Power 53-762mW 0.94 nJ/bitliter ___________ 5.69 nJ/bit Frequency 100 (MHz) _________________ Technology 9Onm (nm) _____________ Block Up to 128K length ___________________ Figure 15 shows the ratio of parallelizing and decoding speeds for a decoder unit according to a specific embodiment of the present invention. The decoding data rate Mbs is shown on the vertical axis and the number of parallel SISO decoder units is shown on the horizontal axis. The lines 171 to 175 depict frame sizes 1024-6144.
This figure shows improvement in decoding data rate with a number of SISO decoder units used in parallel for various frame sizes.
Figures 16 and 17 show decoding data rates achieved by a data decoder according to a specific embodiment of the present invention. The figures show frame size on the vertical axis and data rate on the horizontal axis and show a different plot for each number of SISO decoder units used in parallel. The graph 181 depicts a single SISO decoder 5 configuration. The graph 182 depicts a two SISO decoder configuration.
The graph 183 depicts a three SISO configuration. The graph 184 depicts a four SISO configuration. Finally, the graph 185, of Figure 17, depicts a sixteen SISO decoder configuration.
A specific embodiment of the present invention is implemented using Field Programmable Gate Arrays (FPGA). Configuration of an FPGA, or similar device, may, in specific embodiments of the present invention, be performed using a computer that includes or has access to a storage medium bearing computer readable instructions.
Execution of these instructions, in these specific embodiments, causes the computer with appropriate peripheral hardware to carry out processes to configure an FPGA, or other configurable device, to provide devices according to specific embodiments of the present invention, or to carry out processes according to specific embodiments of the present invention. These instructions may be in the form of source code in a hardware description language, such as VHDL or Verilog, to be used in conjunction with a computer. These instructions may be stored on a computer readable storage medium or transmitted to computer.
Alternative specific embodiments may include Application Specific Integrated Circuits (ASICs) configured to provide devices or carry out processes according to specific embodiments described herein.
Further specific embodiments of the present invention are a communications device which includes a data decoder device according to specific embodiments described herein. Yet further specific embodiments are portable communications devices, examples include devices which communicate using 3GPP protocols.
Although the above-described embodiments of the invention are intended to inform the reader as to the possibilities for implementation of the invention, the invention is not limited to such embodiments. Indeed, the reader will appreciate that many alternative embodiments, and modifications, replacements or omissions of individual features of the illustrated embodiments are possible within the scope of the invention. The invention should instead be read as being defined by the appended claims, which can be read in conjunction with, but should not be considered limited by, the present
description and accompanying drawings.
This combination of dynamic and static reconfiguration in a barrel shifted circular ring representation using an inherent property of QPP interleaving provides a relatively reduced VLSI area and power consumption for an interconnect and yields a high throughput turbo decoding array.
The turbo decoding reconfigurable array device according to specific embodiments of the present invention described herein provide a reconfigurable device which can adapt it's power consumption for given rates of data decoding.
Inclusion of an extrinsic unified state machine 4 allows the data decoder device to operate with only intermittent microprocessor supervision.
The invention will be understood to be exemplified by, but not bound by the preceding description of specific embodiments thereof. Variants in modifications to the above-described embodiment should not be read as being outside the scope of the invention.
The scope of the invention should be considered by reference to the claims appended hereto, read with reference to but not dictated by the supporting description in the accompanying drawings.

Claims (20)

  1. CLAIMS: 1. A data decoding device operable to perform interleaved decoding operations comprising: an array of decoder units, the array operable to provide a selected number of decoder units for use in parallel; one or more random access memory (RAM) addresses; an interconnect operable to provide access to the one or more RAM addresses for the selected number of decoder units; an addressing unit operable to provide access addresses for the interconnection of the selected number of decoder units and RAM addresses, wherein an access address relates a decoder unit to a RAM address, and wherein the addressing unit is operable to perform barrel shift operations to shift a data set representing each of the selected number of decoder units relative to a data set representing a corresponding number of RAM addresses to provide updated interconnection addresses.
  2. 2. A device as claimed in claim 1, wherein the device is operable to perform Quadratic Polynomial Permutation (QPP) interleaved decoding operations.
  3. 3. A device as claimed in claim 2, wherein the selected number of decoder units is constrained to a defined set of numbers, the defined set chosen for a size of frame for data to be decoded.
  4. 4. A device as claimed in claim 3, wherein the defined set of numbers comprises: 1,2,4,8and 16.
  5. 5. A device as claimed in any one of the preceding claims wherein the number of decoder units for use in parallel is selected according to a rate of data to be decoded.
  6. 6. A device as claimed in any one of the preceding claims wherein the device is operable to perform iterations of the interleaved decoder operations and the addressing unit is operable to perform a barrel shift operation for each iteration.
  7. 7. A device as claimed in any preceding claim wherein the addressing unit is extrinsic to the array.
  8. 8. A device as claimed in any one of the preceding claims wherein the RAM addresses correspond to a RAM which is extrinsic to the array.
  9. 9. A device as claimed in any one of the preceding claims wherein the array comprises an integrated circuit.
  10. 10. A device as claimed in claim 9 wherein the integrated circuit comprises a very-large-scale integration (VLSI) integrated circuit.
  11. 11. A device as claimed in any one of the preceding claims wherein the parallel decoder units are operable to perform maximum likelihood decoder operations.
  12. 12. A device' as claimed in any one of the preceding claims wherein the decoder units are operable to perform soft-in soft-out decoder operations.
  13. 13. A device as claimed in claim 12 wherein the device is operable to perform turbo decoding.
  14. 14. A device as claimed in any one of the preceding claims comprising a configuration controller operable to select the number of decoder units in parallel.
  15. 15. A device as claimed in claim 14 wherein the configuration controller comprises a state machine.
  16. 16. A device as claimed in claim 14 or 15 wherein the configuration controller is extrinsic to the array.
  17. 17. A device as claimed in any one of the preceding claims wherein the interconnect comprises an integrated circuit implementation.
  18. 18. A method of decoding data comprising performing interleaved decoding operations using a selected number of decoder units in parallel and using one or more RAM addresses with iterative RAM access operations, wherein the method further comprises: storing a first data set having elements representing each selected decoder unit; storing a second data set having elements representing each RAM address, storing a first data set representing each selected decoder unit; storing a second data set representing a corresponding number of RAM addresses, performing one or more barrel shift operations to shift the first data set relative to the second data set; performing one or more RAM access operations according to interconnection addresses provided by the relative alignment of the first and second data sets.
  19. 19. A data decoder device operable to perform interleaved decoder operations, the device comprising: an array of decoder units, operable to provide a selected number of parallel decoder units for use in parallel; one or more random access memory (RAM) units; an interconnect operable to provide access to the one or more RAM units for decoder units; an addressing unit operable to provide contention free access addresses for the interconnection of the selected number of decoder units and RAM addresses, wherein an access address relates a decoder unit to a RAM address; a controller operable to select the number of decoder units according to characteristics of data to be decoded and operable to control the interconnect to provide interleaved access to the RAM for the selected number of decoder units.
  20. 20. A computer program product bearing computer executable instructions which, when executed in a computer, will cause the computer to configure a configurable device to provide the device of any of claims I to 17 or I 9.Amendments to the claims have been filed as follows CLAIMS: 1. A data decoding device operable to perform interleaved decoding operations comprising: an array of decoder units, the array operable to provide a selected number of decoder units for use in parallel; one or more random access memory (RAM) addresses; an interconnect operable to provide access to the one or more RAM addresses for the selected number of decoder units; an addressing unit operable to provide access addresses for the interconnection of the selected number of decoder units and RAM addresses, wherein an access address relates a decoder unit to a RAM address, and wherein the addressing unit is operable to perform barrel shift operations to shift a data set representing each of the selected number of decoder units relative to a data set representing a corresponding number of RAM addresses to provide updated interconnection addresses.2. A device as claimed in claim 1, wherein the device is operable to perform Quadratic Polynomial Permutation (QPP) interleaved decoding operations.3. A device as claimed in claim 2, wherein the selected number of decoder units is constrained to a defined set of numbers, the defined set chosen for a size of frame for data to be decoded.4. A device as claimed in claim 3, wherein the defined set of numbers comprises: 1,2,4,8and 16.5. A device as claimed in any one of the preceding claims wherein the number of decoder units for use in parallel is selected according to a rate of data to be decoded.6. A device as claimed in any one of the preceding claims wherein the device is operable to perform iterations of the interleaved decoder operations and the addressing unit is operable to perform a barrel shift operation for each iteration.7. A device as claimed in any preceding claim wherein the addressing unit is extrinsic to the array.8. A device as claimed in any one of the preceding claims wherein the RAM addresses correspond to a RAM which is extrinsic to the array.9. A device as claimed in any one of the preceding claims wherein the array comprises an integrated circuit.10. A device as claimed in claim 9 wherein the integrated circuit comprises a very-large-scale integration (VLSI) integrated circuit.11. A device as claimed in any one of the preceding claims wherein the parallel decoder units are operable to perform maximum likelihood decoder operations.12. A device' as claimed in any one of the preceding claims wherein the decoder units are operable to perform soft-in soft-out decoder operations.13. A device as claimed in claim 12 wherein the device is operable to perform turbo decoding.14. A device as claimed in any one of the preceding claims comprising a configuration controller operable to select the number of decoder units in parallel.15. A device as claimed in claim 14 wherein the configuration controller comprises a state machine.16. A device as claimed in claim 14 or 15 wherein the configuration controller is extrinsic to the array.17. A device as claimed in any one of the preceding claims wherein the interconnect comprises an integrated circuit implementation.18. A method of decoding data comprising performing interleaved decoding operations using a selected number of decoder units in parallel and using one or more RAM addresses with iterative RAM access operations, wherein the method further comprises: storing a first data set having elements representing each of the selected number of decoder units; storing a second data set representing a number of RAM addresses corresponding to the selected number of decoder units; performing one or more barrel shift operations to shift the first data set relative to the second data set; performing one or more RAM access operations according to interconnection addresses provided by the relative alignment of the first and second data sets.19. A computer program product bearing computer executable instructions which, when executed in a computer1 will cause the computer to configure a configurable device to provide the device of any of claims ito 17. * * . S. S * . * S.. S. * S * S.. *... * S *SS. *. S. * S SS S
GB0815531A 2008-08-26 2008-08-26 A data decoding device and method Expired - Fee Related GB2463011B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0815531A GB2463011B (en) 2008-08-26 2008-08-26 A data decoding device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0815531A GB2463011B (en) 2008-08-26 2008-08-26 A data decoding device and method

Publications (3)

Publication Number Publication Date
GB0815531D0 GB0815531D0 (en) 2008-10-01
GB2463011A true GB2463011A (en) 2010-03-03
GB2463011B GB2463011B (en) 2010-12-29

Family

ID=39846823

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0815531A Expired - Fee Related GB2463011B (en) 2008-08-26 2008-08-26 A data decoding device and method

Country Status (1)

Country Link
GB (1) GB2463011B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011077947A (en) * 2009-09-30 2011-04-14 Fujitsu Ltd Turbo decoding device and communication device
EP2621091B1 (en) * 2010-09-25 2017-09-06 ZTE Corporation Turbo code parallel interleaving with quadratic permutation polynomial (qpp) functions

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070124655A1 (en) * 2005-11-10 2007-05-31 Samsung Electronics Co., Ltd. Apparatus and method for a collision-free parallel turbo decoder in a software-defined radio system
EP1931035A1 (en) * 2006-11-27 2008-06-11 Fujitsu Limited Turbo decoder and turbo decoding method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070124655A1 (en) * 2005-11-10 2007-05-31 Samsung Electronics Co., Ltd. Apparatus and method for a collision-free parallel turbo decoder in a software-defined radio system
EP1931035A1 (en) * 2006-11-27 2008-06-11 Fujitsu Limited Turbo decoder and turbo decoding method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011077947A (en) * 2009-09-30 2011-04-14 Fujitsu Ltd Turbo decoding device and communication device
EP2621091B1 (en) * 2010-09-25 2017-09-06 ZTE Corporation Turbo code parallel interleaving with quadratic permutation polynomial (qpp) functions

Also Published As

Publication number Publication date
GB2463011B (en) 2010-12-29
GB0815531D0 (en) 2008-10-01

Similar Documents

Publication Publication Date Title
Sun et al. Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder
Wang et al. Parallel interleaver design for a high throughput HSPA+/LTE multi-standard turbo decoder
US7908542B2 (en) Method of and apparatus for implementing a reconfigurable trellis-type decoding
JP4478668B2 (en) Method and system for interleaving in parallel turbo decoders.
CA2567248A1 (en) A method of and apparatus for implementing a reconfigurable trellis-type decoding
Sun et al. Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards
JP5840741B2 (en) Method and apparatus for programmable decoding of multiple code types
KR20050080720A (en) Decoding system supporting turbo decoding and viterbi decoding
JP5700035B2 (en) Error correction code decoding apparatus, error correction code decoding method, and error correction code decoding program
Prescher et al. A parametrizable low-power high-throughput turbo-decoder
Gonzalez-Perez et al. Parallel and configurable turbo decoder implementation for 3GPP-LTE
Wang et al. High-throughput Contention-Free concurrent interleaver architecture for multi-standard turbo decoder
Lee et al. Architecture design of QPP interleaver for parallel turbo decoding
GB2463011A (en) Interleaved decoding using a selectable number of parallel decoding units interconnected with RAM units
US20080115032A1 (en) Efficient almost regular permutation (ARP) interleaver and method
Abbasfar et al. An efficient and practical architecture for high speed turbo decoders
Lin et al. A 40 nm 535 Mbps multiple code-rate turbo decoder chip using reciprocal dual trellis
Li et al. Unified convolutional/turbo decoder design using tile-based timing analysis of VA/MAP kernel
Chen et al. A 691 Mbps 1.392 mm 2 configurable radix-16 turbo decoder ASIC for 3GPP-LTE and WiMAX systems in 65nm CMOS
Asghar et al. Implementation of a Radix-4, parallel turbo decoder and enabling the multi-standard support
Mathana et al. Low complexity reconfigurable turbo decoder for wireless communication systems
Shin et al. A programmable turbo decoder for multiple 3G wireless standards
Wong et al. A 0.22 nJ/b/iter 0.13 μm turbo decoder chip using inter-block permutation interleaver
Ahmed et al. A Reconfigurable Viterbi decoder for a communication platform
Abdel-Hamid et al. Memory conflict analysis for a multi-standard, reconfigurable turbo decoder

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20130826