GB2418817A

GB2418817A - Decoder with plural maximum likelihood detectors which share distance metric information assigned to parallel processors

Info

Publication number: GB2418817A
Application number: GB0421609A
Authority: GB
Inventors: Mong Suan Yee
Original assignee: Toshiba Research Europe Ltd
Current assignee: Toshiba Europe Ltd
Priority date: 2004-09-29
Filing date: 2004-09-29
Publication date: 2006-04-05
Anticipated expiration: 2024-09-29
Also published as: GB0421609D0; GB2418817B

Abstract

Decoder for a signal comprising a string of single or multi-bit symbols. The decoder comprises multiple maximum likelihood (ML) sphere detectors. Each detector is associated with a particular value of one bit of the symbol string and determines a minimum distance metric between a received signal point and an estimated received signal point. The detectors then share their distance metrics with the other detectors (see Figs. 2,3), which then use them in determining the sphere radius they will use in their next search (see Fig. 5). The system may use multiple processors in which case each bit value detector is allocated to a processor in turn, typically MSB first. When the allowed number of operations for a given string has been reached, distance metrics and assumed bit values are passed to a Max-Log-MAP decoder to determine a most likely symbol string and the processors are assigned to the sphere detectors for the next string.

Description

Signal Decoding Methods and Apparatus This invention is generally

concerned with apparatus, metllods, processor control code and architectures for signal decoding, in particular decoding employing a plurality of maximum likelihood hard detectors such as sphere decoders.

This application is related to the Applicant's co-pending UIC patent applications numbers 0323211.3 and 0416820.9 filed on 3 Oct 2003 and 28 July 2004 respectively, to wl1icl1 reference may be made.

A general problem in the field of signal processing relates to the transmission of a signal fiom a transmitter to a receiver over a channel, the problem being to determine the transmitted signal from the received signal. The received signal is affected by the cl1auel impulse response or 'memory' of the channel which can cause interference between successively transmitted symbols, and the transmitted signal may also have been encoded prior to sending. A decoder or detector at the receiver has the problem of decoding or detecting the originally transmitted data and/or the original data that has been encoded at the transmitter. In this specification the terms detector and decoder are vised interchangeably since they both essentially try to solve a similar problem, that is detecting the original data.

The optimum decoder is the a posterior) probability (APP) detector or decoder which performs an exhaustive search in the space of all possible transmitted symbol combinations, followed by a calculation of the corresponding joint posterior distributions. However the computational complexity of such an approach grows exponentially as 2'7t with the memory of the encoder or channel impulse response length 1;, and with the number of bits per symbol it, or possible symbols or codewords per transmission to consider. Sub-optimal approaches are therefore of teclmical and commercial interest.

One reduced complexity approximation to the APP solution is the so-called max-log approximation. Broadly speaking determining a bit likelihood value according to this approach involves determining maximum values for two terms, one of which conesponds to the bit having a first logic value, say +1, the other corresponding to the bit leaving a second logic value, say -1. It has been recognised that maximising each of these teens corresponds to minimising a related distance metric for a candidate string of transmitted symbols, preferably taking into account any a priori lalowledge which can act as a soft input to the procedure. In embodiments of the invention sphere decoding may be employed to search for a minimum SUCH metric.

There is a continuing need for increased data rate transmission and, equivalently, for more efficient use of available bandwidth at existing data rates. Presently WLAN (wireless local area networl<) standards such as Hiperlan/2 (in Europe) and IEEE802. 1 1a (in the USA) provide data rates of up to 54 Mbit/s. The use of multiple transmit and receive antennas has the potential to dramatically increase these data rates, but decoding signals received over a MIMO chamel is difficult because a single receive antenna receives signals from all the transmit antemas. A similar problem arises in multi-user systems, although symbols transmitted over the different channels and from different users with different spreading sequence are then uncon-elated. Tllere is therefore a need for improved decoding techniques for MIMO systems. Tllese techniques have applications in wireless LANs, potentially in fourth generation mobile phone networks, and also in many other types of communication system.

Here, to provide a context helpful for understanding the invention, some reference will be made to applications involving signals received over a MIMO (multiple-input multiple-output) channel, and to space-time decoding. However embodiments of the invention described herein may also be employed in related systems such as CDMA systems, in which signals from multiple users, differentiated by their unique spreading sequence codes, are received by the receivers (and which may also employ MIMO techniques). In a typical CDMA (code division multiple access) system multiple users share the same spectrum and are transmitted/received at the same time slot but are differentiated since they are transmitted with a unique orthogonal spreading code. At tile receiver, tile signals from different users are separated by 'despreading' based on their unique code. However, synchronization problems auld effect of the channel can destroy the orthogonality and thus extra processing may be needed to differentiate tile signal fiom tile different users.

Figure 1 shows a typical MIMO data communications system 100. A data source 102 provides data (comprising information bits or symbols) to a chapel encoder 104. The channel encoder typically comprises a convolutional coder such as a recursive systematic convolutional (RSC) encoder, or a stronger so-called turbo encoder (which includes an interleaves). More bits are output than are input, and typically tile rate is one half or one third. The chamel encoder 104 is followed by a charnel interleaver 106 and, in the illustrated example, a space-time encoder 108. The space-time encoder 108 encodes an incoming symbol or symbols as a plurality of code symbols for simultaneous transmission from each of a plurality of transmit antennas 110.

Space-time encoding may be described in terms of an encoding machine, described by a coding matrix, which operates on the data to provide spatial and temporal transmit diversity; this is followed by a modulator to modulate the coded symbols for transmission. Space-frequency encoding may additionally (or alternatively) be employed. Tllus, broadly speaking, the space time and/or frequency encoded symbols are distributed into a grid having space and time and/or frequency coordinates, for increased diversity. Where space-frequency coding is employed tile separate frequency channels may be modulated onto OFDM (orthogonal frequency division multiplexed) carriers, a cyclic prefix generally being added to each transmitted symbol to mitigate the effects of channel dispersion.

Tile encoded transmitted signals propagate through MlMO channel 112 to receive antennas 1 14, whicl1 provide a plurality of inputs to a spacetime (and/or frequency) decoder 1 16. The decoder has the task of removing the effect of tile encoder 108 and tile MIMO channel 112, and may be implemented by a sphere decoder. The output of the decoder 116 comprises a plurality of signal streams, one for each transmit antenna, each carrying so-called soft or likelihood data on the probability of a transmitted symbol having a particular value. This data is provided to a channel de-interleaver 118 which reverses the effect of channel interleaves 106, and then to a channel decoder 120, such as a Viterbi decoder, which decodes the convolutional code. Typically channel decoder is a SIHO (soft-in hard-out) decoder that is receiving symbol (or bit) likelihood data and providing data on which a hard decision l1as been made. The output of channel decoder l TO iS provided to a data sink 122, for further processing of the data in any desired manner.

In some communications systems so-called turbo decoding is employed in which a SISO (sofl-in soft-out) chapel decoder is required and soft output from channel decoder l 20 is provided to a channel interleaves 124, corresponding to channel interleaver 106, which in tuna provides soft (likelihood) data to decoder l 16 for iterative space-time (and/or frequency) and channel decoding. (It will be appreciated that in such an arrangement charmed decoder 120 provides soft information of the complete transmitted bits to decoder 116, that is for example including error check bits.) Here we will consider the general problem of estimating a string of transmitted symbols Tom a received signal. Tl1e string of symbols may be distributed in space, for example across multiple transmit antennas, time (for example with a space time block or trellis coder, and/or frequency, for example where multiple frequency channels or carriers are employed). Embodiments of the teclmiques described herein are applicable to all these problems.

There are many known types of decoder, for example trellis-based decoders (a variant of tle maximum likelihood, ML, approach) such as the Viterbi decoder, linear decoders SUC]1 as zero-forcing and minimum mean squared error (MMSE) estimators, the vertical- BLAST (Bell labs LAyered Space Time) decoder, tl1e MMSE multi-user detector (MUD), and the block decision feedback equaliser.

A sphere decoder can provide performance which approaches that of an APP decoder but at considerably reduced complexity. Broadly speaking candidates for the transmitted signal, modified by the charmer response (and space-tine encoder) are represented as a lattice in which points correspond to possible (noiseless) received signals. Tle sphere decoding procedure aims to find one or a few lattice points nearest the actually received signal. The procedure performs a search in a multi-dimensional spherical region centred on the actually received signal. The procedure provides a technique for identifying which lattice points are within the required search radius (which may be adjusted according to tle noise level and/or channel conditions). The choice of initial search radius can significantly affect the complexity (number of computations) involved in the procedure.

It is helpful, at this point, to provide an outline review of the operation of the sphere decoding procedure. For a string of N transmitted symbols an N-dimensional lattice is searched, beginning with the Nth dimensional layer (corresponding to the first symbol of the string). A symbol is selected for this layer from the constellation employed and the distance of the generated lattice point from the received signal is checked. If the lattice point is within this distance the procedure then chooses a value for the next symbo] in the string and checks the distance of the generated lattice point from the received signal in N-l dimensions. The procedure continues checking each successive symbol in turn, and if all are within the bound it eventually converges on a lattice point in one dimension. If a symbol is outside the chosen radius then the procedure moves bacl; up a layer (dimension) and chooses the next possible symbol in that layer (dimension) for checking. In this way the procedure builds a tree in Chicle the lowest nodes correspond to comp]ele strings of symbols and in which the number of nodes at the ''th level of the tree corresponds to the number of lattice points inside the relevant lath dimensional sphere.

When a complete candidate string of symbols is found the distance of tile lattice point, generated from the string of symbols, from the received signal is derived and the initial radios is reduced to this distance so that as the tree builds only closer strings to the maximum-likelihood solution are identified. When the tree has been completed the decoder can be used to provide a hard output, i.e. the maximum likelihood solution, by choosing the nearest lattice point to the received signal. Alternatively a soft output can be provided using a selection of the closest lattice points to the received signal, for example using the distance of each of these from the received signal as an associated likelihood value Baclcground prior art relating to sphere decoding can be found in: E. Agrell, T. Eriksson, A. Vardy and K. Zeger, "Closest Point Search in Lattices", IEEE Trans. on Information Theory, vol. 48, no. 8, Aug 2002; E. Viterbo and J. Boutros, "A universal lattice code decoder for fading channels", IEEE Trans. Infonn. Theory, vol. 45, no. 5, pp. 1639-1642, Jul. 1999; O. Damen, A. Chlceifand J. C. Belfiore, "Lattice code decoder for space-time codes, " IEEE Comms. Letter, vol. 4. no. 5, pp. 161-163, May 2000; B. M. Hochwald and S. T. Brinlc, ''Achieving near capacity on a multiple- antenna channel," IEEE Trans. Commun., vol. 51, no. 3, pp.389-399, Mar. 2003; US 2003/0076890 to B. M. Hochwald and S. T. Brinlc; H. Vikalo and B. Hassibi, "Low- complexity iterative detection and decoding of multi-antenna systems employing channel and space-time codes", Conference Record of the Tlirty- Sixth Asilomar Conference on Signals and Systems and Computers, vol. 1, Nov 3-6, 2002, pp. 294-298; A. Wiesel, X. Mestre, A. Pages and J. R. Fonollosa, "Efficient Implementation of Sphere Demodulation", Proceedings of IV IEEE Signal Processing Advances in Wireless Communications, pp. 535, Rome, June 15-18, 2003; L. Brunel, J. J. Boutros, "Lattice decoding for joint detection in direct-sequence CDMA syslems", IEEE Transactions on Information Theory, Volume: 49 Issue: 4, April 2003, pp. 1030-1037; US patent application US20030076890, filed on July 26, 2002, to B. M. Hochwald, S. Ten Brinlc, "Method and apparatus for detection and decoding of signals received from a lineau propagation channel", Lucent Technologies, Inc. US patent application US20090114410, filed on August 22, 2002, to L. Brunel, "Multiuser detection method and device in DS-CDMA mode", Mitsubishi Denki Kabushiki Kaisha; H. Vikalo, "Sphere Decoding Algorithms for Digital Communications", PhD Thesis, Standford University, 2003; and B. Hassibi and H. Vikalo, "MaximumLikelihood Decoding and Integer Least-Squares: The Expected Complexity," in Mzzltiazeuza Cancels: Capacity, Coaling anal Signal Processing, (editors J. Foschini and S. Verdu).

Other background prior art can be found in: J. Luo, K.; R. Pattipati, Peter Willett, G. M. Levchuk, "Fast Optimal and Suboptimal Any-Time Algorithms for CDMA Multiuser Detection Based on Branch and Bound", IEEE Transactions on Communications, Volwne:52, Issue: 2, Feb.2004, Pages:336 336 which describes a fast optimal algorithm based on the branch-andbound (RED) method for the joint detection of binary symbols of IC users in a synchronous code-division multiple-access chamel with Gaussian noise, as one alternative to sphere decoding; and in M. J. Juntti, T. Schlosser and ]. O. Lilleberg, "Genetic algoritlns for multiuser detection in synchronous CDMA", Proc. Of 1997 EKE International Symposium on Information Theory, 29 June - 4 July 1997, pg.

492, which describes the application of genetic algorithms to suboptimal multiuser detection as another alternative to sphere decoding; these are hereby incorporated by reference.

We have previously described, in UK patent applicationsO323211.3 and 0416820.9 (ibid) how multiple maximum likelihood (ML) detectors or decoders may be employed to provide soft bit-likelihood output values, based upon a max- log MAP (maximum a posterior probability) approach using multiple hard detection decisions. In this arrangement the minimum distance metric for a bit is determined for each of the two possible values of the bit and thus only two candidate possibly transmitted symbols are required for each bit of the string. The configuration of such a decoder lends itself to a parallel implementation for the plurality of maximum likelihood decoders. In preferred embodiments the maximum likelihood detectors or decoders comprise sphere decoders.

We have also described how a single, common ML symbol detector may be employed in this configuration to determine a (common) distance metric for each bit of a complete, maximum likelihood string of symbols, one further maximum likelihood detector then being provided for each bit of the string of symbols, for determining a minimum distance metric for the relevant bit, each of these maximum likelihood decoders determining tile distance metric for a value of the bit different to its value in the maximum likelihood string of symbols. A sphere decoder determining a (minimum) distance metric for a particular bit may have its initial radius set to the value of the metric given by tile maximum likelihood string of symbols with the relevant particular bit inverted or 'flipped' (that is the bit is given its alternative logic value to the logic value it has in the maximum likelihood string of symbols). The decoder which is configured to determine the maximum likelihood string of symbols is preferably initialised to a sphere radius of infinity so that at least one lattice point will always be found. The maximum li]<elillood string of symbols may be determined taking into account an a prior' probability value for each bit of the string, thus facilitating a soft input for the decoder.

Tile above described max-log MAP approach provides near ML perfonnance with greatly reduced overall complexity. However there is a need for systems, methods and architectures for efficient implementation of the above-described decoder. The inventor has recognised float this need can be addressed tlougl1 the use of co-operative ML detectors, where information derived during the ML detection operations is sl1ared to assist in tile evaluation of soft, bit-likelihood information for the decoded bits. Tile inventor has further recognised that this need can further be addressed though the effcielt exploitation of hardware resources.

According to a first aspect of the present invention there is therefore provided a decoder for decoding a received signal, said received signal being provided by one or more tranenitted signals defining a string of symbols, each said symbol comprising one or more bits, said decoder comprising: a plurality of maximum likelihood (ML) detectors each configured to determine a minimum bit-dependent distance metric for a respective candidate string of symbols in which a bit has a defined value, said distance metric being dependent upon a distance of said received signal from an estimated received signal determined from said candidate string; and a bit likelihood estimator to receive a said minimum distance metric from each of said ML detectors and configured to determine a bit likelihood value for each bit of said transmitted string dependent upon said minimum distance metrics; the decoder further comprising: a plurality of data processors, each configured to implement an ML detection task for a said ML detector; and a task distribution system configured to distribute a plurality of said ML detection tasks for said plurality of ML detectors amongst said plurality of data processors.

In enbodhnents employing a plurality of data processors (which may comprise hardware, or software, or a combination thereof) to implement ML detection tasks, in combination with a task distribution system provides a flexible parallel hardware architecture. This is well suited to, for example, software defined radio where the quality of the detection and the complexity can be adjusted according to channel link quality and hardware requirements while employing the same basic algorithm (ML detection) in, for example, a soft-in/soft-out decoder. This is enabled by the decoder structure, whicl1 employs multiple, bit-wise ML detectors. Preferably each ML detection task implements an ML detector; an ML detector may comprise a sphere decoder or some other detection algorithm, for example a genetic algorithm, may be h1lplemented. Preferably the ML detection tasks (and hence ML detectors) share information during their detection operations. Embodiments of the invention can be contrasted with conventional decoding methods, for example Viterbi decoding, hnear detection, and list sphere decoding algorithms, which are not adapted to being distributed amongst the plurality of data processors and whicl1 thus do not provide the flexible design architecture of embodiments of decoders and related methods according to the invention.

In many cases an ML detection task has a variable duration, for example because the ML detection involves searching through candidate strings of symbols to find the string the with the lowest relevant distance metric. However this can conflict with data processing requirements in a practical system in which the decoding must be matched to the data rate: that is on average the time talcen to decode a string of symbols should be equal to or less than a duration of the string of symbols. For this reason preferably an ML detection task is controlled to limit the task duration, for example by truncating the search after a predetermined time limit, and/or number of operations, and/or number of iterations, or searched nodes a tree. In this context the taste distribution system is preferably configured to allocate the ML detection tasks to the data processors such that the distance metric determinations are substantially performed within a time duration of a transmitted siring of symbols to be decoded. During this time a said data processor may perform one, two or more ML detection tasks, as time allows. Broadly speaking, the rate of data decoding should on-average at least keep pace with the rate of data reception. The skilled person will understand that the task distribution system may itself be distributed between one or more of the plurality of data processors, for example a taslc rippling through the plurality of data processors until an available processor is r cached In one embodiment one, or preferably each of tle data processors implements two (or more) of the ML detection tasks within the duration of a transmitted string of symbols.

In this way the ML detection tasks can be "stacked" to efficiently pack the detection operations into the available time. Where an ML detection taste on a data processor fmislles early (as opposed to being truncated) preferably the next ML detection task on that processor is commenced substantially immediately.

In embodiments a common, maximum likelihood distance metric is determined for a (common) maximum likelihood string of symbols wl1ich is unconstrained, that is in which each bit can take any value. This may be detennined by one or a pair of ML detectors in a first data processing stage. Once this common ML symbol has been determined a bit-wise ML detector may be allocated to each of the bits in tle string of symbols, with tile defined bit values for each corresponding to tile ML string of symbols with tile respective bit reversed or inverted. This set of bit-wise ML detection tasks may be perfonned in a second stage of processing. The ML detection tasks in the first and second processing stages may be allocated such that the first stage processing is given priority. Thus in a first time interval tile first stage processing and a portion of the second stage processing maybe implemented, tile remainder of the second stage processing being implemented in a second, subsequent time interval. These time intervals may be defined, for example, by the maximum permitted time for an ML detection task. Since not all the available data processors may be required for the remainder of the second stage processing, first stage processing for a'1ext or subsequently received string of symbols may be started on the data processors wlicll are available once the remainder of the second stage processing tasks have been allocated.

In this way the first and second stages of processing overlap in time such that a second stage of processing for a first received string of symbols is partially concurrent with the first stage processing for a second, subsequent received string of symbols.

Tile task distribution system may be configured to distribute the ML detection tasks in an order wllicl1 depends upon a desired reliability or confidence level of a bit likelihood value to be derived from an output of the task. For example an ML detection task for a bit for wl1icll an increased confidence soft output is required may be delayed in order that advantage may be taken of shared information from previously allocated or completed detection tasks. Additionally or alternatively where an ML detection task finishes "early" that is before its maximum allowed time, a detection task for an important bit may be started to provide more processing time for that task. Wllere a search is truncated after a time interval or after a set number or iterations or operations the truncation point may be extended for important bits (provided that tile decoder keeps pace with the incoming data overall).

There is a variety of different load/task sharing scheduling mechanisms W] liCh can be employed to distribute the ML detection tasks between tile parallel processing elements (data processors). For example tile task distribution system may be implemented using a "farmer" processor to W]liC]I a plurality of "worker" data processors are directly or indirectly coupled (in embodiments via a task buffer). In this way tasks may be distributed or farmed out by the fanner processor to the worker processors, the results being resumed to the farmer. Tllis arrangement call be employed because tile decoder's operation can be partitioned into a set of ML detection tasks with a relatively low communication overhead.

In order to appreciate the advantages offered by the above described task distribution systems in combination with a plurality of data processors it is helpful to bear in mind that an ML detection taslc in general talces a variable time to finish processing and thus a task distribution systen: (wllicll may be implemented as a queueing mechanism) to distribute the ML detection tasks among the available data processors can facilitate efficient use of available hardware. For example, consider tile case Allure a data processor or processing element implements an ML detector. If, say, sixteen ML detectors are needed to provide a pair of distance metrics to compute a log-likelihood ratio this Piglet, if implemented as sixteen separate data processors, occupy an unduly large Si]iCOlI area. Therefore it is advantageous to be able to reduce tile number of hardware implemented ML detectors whilst, at the same time, efficiently packing the sixteen ML operations into tile available hardware, taking account of a need for the overall decoding process to be completed within a signalling instant comprising a duration of a string of symbols.

The above described task distribution mechanisms are particularly advantageous when the ML detection taslcs share information, since this indirectly improves the quality of tile derived soft output or bit likelihood values when the number of pennitted operations or time is limited.

In embodiments tile distance metric may be determined dependent upon a response of a MIMO channel between transmit and receive antennas, and the decoder may comprise a MIMO decoder; in other embodiments the symbols of the transmitted string may comprise symbols transmitted by different users and the decoder may comprise a multi- user decoder or detector. In still other embodiments the decoder may comprise a block equaliser for frequency selective fading.

In a related aspect the invention provides a method of decoding a received signal, said received signal being provided by one or more transmitted signals defining a string of symbols, each said symbol comprising one or more bits, the method employing a plurality of detectors one allocated to each bit of said string, the method comprising: determining, for each bit of said string using a detector allocated to the bit, a minimum bit-dependent distance metric for a respective candidate string of symbols in which a bit leas a defined value, said distance metric being dependent upon a distance of said received signal from an estimated received signal determined from said candidate string; and determining a bit likelihood value for each bit of said transmitted string dependent upon said minimum distance metrics; and wherein the method further comprises: implementing said bit-allocated detectors using a plurality of data processors, a number of said data processors being less than a number of said detectors; and controlling tile distribution of said distance metric determining by said detectors to distribute said detemlinilg amongst said plurality of data processors.

In embodiments the method controls the distribution of the detectors amongst the data processors.

Tlle invention further provides a decoder configured to implement this method, and a receiver including such a decoder.

Embodiments of the above invention provide a robust, fixed complexity sofl-in/soft-out detection/decoding system. Embodiments of tile describeddecoders and methods may be implemented as, for example, a chamel decoder, a space-time decoder, a multi-user detector, and an equalizer, in either an iterative or non-iterative receiver structure. Here an iterative receiver comprises a design which has concatenated components, such as a claulel decoder, space-time detector or equalizer, whicl1 iteratively provide information in terms of the a piori lalowledge of the transmitted data to each other to improve the decoding/detection/equalization (or other) performance (an example is a so-called turbo decoder with iterative block (code) and channel decoding).

The skilled person will recognise that the above-described decoders and methods may be implemented using and/or embodied in processor control code. Thus in a further aspect the invention provides such code, for example on a carrier medium such as a disk, CD- or DVD-ROM, programmed memory such as read-only memory (Firmware) or on a data carrier such as an optical or electrical signal carrier. Embodiments of the invention may be implemented on a DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array). Thus the code may comprise conventional program code, or micro-code, or, for example, code for setting up or controlling an ASIC or FPGA. In some embodiments the code may comprise code for a hardware description language such as Verilog (Trade Mark), VHDL (Very high speed integrated circuit Hardware Description Language), or SystemC. As the skilled person will appreciate, processor control code for embodiments of the invention may be distributed between a plurality of coupled components in communication with one another.

These and other aspects of the invention will now be further described, by way of example only, with reference to tile accompanying figures in which Figure I shows an example of a MIMO space-time coded communications system; Figure 2 shows a block diagram of a first embodiment of a decoder in accordance with an aspect of the present invention; Figures 3a and 3b show bloclc diagrams of respective second and third embodiments of a decoder in accordance with an aspect of the present invention; Figure 4 shows a flow diagram of a sphere decoder for use with embodiments of tile present invention; Figure 5 shows a block diagram illustrating information sharing between ML detectors in tile decoders of Figures 2 and 3; Figure 6 shows a timeline of ML detection operations for stacked bit-wise ML detectors implemented using P parallel data processors; Figure 7 illustrates an architecture implementing stacked ML detection operations in a max-log-MAP decoder comprising multiple ML detectors, Figure 8 shows curves of bit error rate against signal-to-loise ratio (Eb/No) for list sphere decoders (LSD) and embodiments of max-log-MAP sphere decoders (MLMSD) in accordance with an aspect of the present invention; Figure 9 slows a receiver incorporating an embodiment of a decoder in accordance with an aspect of the present invention; Figures I Oa to I Ob allow, respectively, an example of a parallel processor architecture for the receiver of figure 9, and flow diagrams of a task distribution system; Figure 11 shows a block diagram of a transmitter witl1 concatenated encoders; Figure 12 shows a block diagram of a receiver with concatenated decoders for use with ille transmitter of figure I 1; Figure 13 shows a block diagram of a receiver with concatenated decoders and iterative decoding for use with tile transmitter of figure 11; and Figure 14 shows a block diagram of a receiver employing iterative feedback between two equivalent decoders.

Some preferred embodiments of the invention employ sphere decoders for bitwise ML detection for determining distance metrics for bits of a string of symbols and/or for determining a distance metric for a complete ML string of symbols (although the invention is not [united to tile use of sphere decoding). It is therefore helpful for understanding the invention to describe some aspects of sphere decoding in detail.

Consider a space-time transmission scheme with '!T transmitted and, received signals, for example in a MIMO communications system with '!T transmit and R receive antennas. Tile I x lfR received signal vector at each instant time k is given by: rk skH' + vk Equation I wilere s6 = [< .. 5','/r] denotes the transmitted vector whose entries are chosen from some complex constellation C witl1 M = impossible signal points and q is tile number of bits per constellation symbol. Tile AWGN (Additive White Gaussian Noise) vector v' is a I x i!R vector of independent, zero-mean complex Gaussian noise entries with variance of 2 per real component. Tile notation IIk denotes an i7T X!IR multiple- input/multiple-output (MIMO) channel matrix assumed to be letdown or estimated at the receiver, with n -row and /' -column components lin,,,' n = 1,. , 77r, '77 = 1, .,'?R, representing the narTowband Bat fading between tile '' -th transmitted signal and '', -th received signal. Tile channel fade may be assumed to be constant over a symbol period.

In a receiver a MIMO charnel estimate Hk can be obtained in a conventional mangler using a training sequence. For example a training sequence can be transmitted from each transmit antenna in turn (to avoid interference problems), each time listening on all tile receive antennas to characterise tile caramels from that transmit antenna to the receive antennas. (Tllis need not constitute a significant overhead and data rates are Leigh in between training and, for example, with slowly changing indoor channels training may only be performed every, say, 0.1 seconds). Alternatively orthogonal sequences may be transmitted simultaneously from all the transmit antennas, although this increases the complexity of the training as interference problems can then arise.

All linear space-time bloclc coded transmission schemes can be written in the form of Equation 1. For example, BLAST (G. J. Foschini, "Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas," Bell Labs. Tech. J., vol. 1, no. 2, pp. 41-59, 1996) uses tile transmit antennas to send a layered structure of signals, and therefore ''r represents tile number of transmit antennas, ']R represents tile number of receive antennas and Hk is tile true MIMO channel matrix. Other examples include orthogonal designs (S. M. Alamouti, "A simple transmitter diversity scheme for wireless communications," IEEE J. Sell Area Comm., pp.1451-1458, Oct. 1998; and V. Tarokll, H. Jafarkllani and A. R. Calderbank, "Space- time block codes from orthogonal designs," IEEE Trans. Info. Theory., vol. 45, pp. 1456-1467, July 1999) and linear dispersive codes (B. Hassibi and B. Hochwald,''Higl1- rate codes that are linear in space and time," IEEE Trans. Info. Tlleory., vol. 48, pp. 1804-1824, Jul. 2002), where H' is an effective channel derived from one or more uses of the true channel.

Equation I may also be used to represent a COMA system where the multiuser detector estimates the signal s' transmitted from different users and matrix Hk represents the combined spreading and channel effects for all users.

Ignoring the time index k for simplicity of discussion, the n -th component of the transmitted symbol s is obtained using the symbol mapping function sin - iap(xn), '' = 1, air Equation 2 where xn = kt, xq' Equation 3 is a vector with qua transmitted data bits, and q is the number of bits per constellation symbol. (More generally, however, s denotes a string of symbols encoded over space and/or time and/or frequency and 77 nuns over the length of the string). Therefore the (q ' 7lT) -length vector of bits transmitted can be denoted by x = [xl xur] Equation 4 and the transmitted vector constellation is written as s = nlap(X) The complex matrix representation of Equation 1 (ignoring the time index k) can be transformed to a real matrix representation with twice the dimension of the original system as follows: r = sH + v Equation 5 where r = |{r} 3{r}] Equation 6 s [{is} 0{ )] Equation7 H = 9{} O4b Equation 8 v = [511{V} 3{V}] Equation 9 We shall use the real- valued representation of Equation 5 to Equation 9 in the following discussion so that, for example, r and s are real vectors and H is a real matrix.

Tle maximum a posse, iori probability (APP) bit detection, conditioned on the received signal r can be expressed in log likelihood ratio (LLR) teens as follows: L ( I) in P(N +1 r) exp(-2 2 ||1 s}I||- +2 x L,,) xeXr,s=nnpx) cr =In exp(- 2 llr-SHII ± X LA) XEX,, I s=n''ox, 2cr 2 = LA (XJI) exp(-2 29||r-sEI|| +2' X[n,)]'A,l',')l) *,,,) .;=,ri7p(: ) +In exp ( 2 2 ||r Si1|| 2 X[n,jl L,/,[n I] ) xe.Y,, J i=naptx) cr _.

L X/I I r) 11 = 17,/?7 j = 1, ,] Equation 10 where x is a sequence of possible transmitted bits, L,, is a vector of LA -values of x s is a vector of possible transmitted symbols, i.e. s = ''lap (x), function mapO providing the mapping from bits to symbol, xl,, ;] denotes the bit-wise sub vector of x obtained by omitting itS element Xj' and L,/ [,, al denotes the vector of all L,, -values, also omitting the element corresponding to bit x,'; and where || || denotes the Euclidean none. As previously mentioned, j indexes a bit in a constellation symbol of q bits. For space-time detection I? iS a spatial index to one of,iT spatially multiplexed transmissions; in a multi- user system i? indexes one of a plurality (FIT) of users.

The set X,+J is tle set of 2q"r-0 bit vectors x havingx;' = +1 i.e. X,,j =(XIXj =+1) and X,,j =Ix|xj -- 1} . The symbol 3 is the mapping to the possible transmitted bit vectorx. The functions LP ( ) do,' ( ) arid LE ( ) denote the a posterior), a priori and extrinsic likelihood ratio respectively.

According to Equation 10 APP detection requires an exhaustive evaluation of 2q"r distance metrics ||r - sH|| corresponding to the number of elements in the set X,f j and X,, . The computational complexity of APP detection increases exponentially with the number of bits per symbol q and number of spatial-multiplexed transmitted symbols to.

However the APP LLR (log likelihood ratio) can be approximated using the max-log approximation of Equation 10 for each bit x;' as follows: - Max {2 ||r-sH||- +xr L,} Lr (xjD | r) 2 In rn 5=npX' rT, -- max i- , ||r sH||- + xr L,, 2 X.rn.5=D'npx' - max - log approximation Equation 1 1 max {- 7 '||r-sH|| + Xt/,j] L,' t,' jet} / h 2 xe.rn,.5=ainpx) cr [Ethel Ir) I 1' 1, T -1 -- maX 1- ||r - SH|| + XE,, Jl LA I,,JI 2 xe.rO,.=n,ap(x' IT max - log approximation 71 = 1' t.,tT j = 1, ,7 One can then search for the candidates that provide the max{ } tend for x e X,, and x X,, j for each transmitted bit without exhaustively evaluating the teml - 2 ||r-sH||2+xr L,' or - 1, 1|r-sH|| +x, LAI!I}] forallpossible g Note that since there are (y ''r) transmitted bits, there (q ,77. ) operations which evaluate Equation I 1.

We have previously described, in Elk patent applications 0323211.3 and 0410890.9 (ibid how hard or maximum likelihood L) detection for tile pair of sets x e X,+, and x X,,, may be employed to find the maximum metrics , ||r - gH||- + xr L,' or - , ||r - sH||- + x,, JUT L,, i,, hi for each set in order to derive the LLRs according to Equation I I. An embodiment of that invention used a sphere decoder search algorithm (see, for example, Viterbo and Boutros, ibia,, hereby incorporated by reference) to search for the candidates s that satisfy the condition ||r-sH|| --XT Lo A A- Equation 12 Broadly speaking, for every candidate found, the bound p is reduced until one candidate is found that satisfies the llilimum metric ||r-gH||- _2XT'LA for a particular bit.

The sphere decoding procedure is well lalown to the skilled person. In outline, the procedure comprises three main processes: i) Transformation of the multiple-input-multiple-output (MIMO) channel into a lattice representation.

ii) The search procedure, whicl1 searches for the nearest lattice point to the received signal in the case of hard detection or the set of lattice points around the received signal in the case of soft detection. Where a soft input is available, providing an n priori probability of a transmitted symbol or codeword, this can be utilised to assist the search (see also, for example, H. Vikalo and B. Hassibi, "Low-Complexity Iterative Detection and Decoding of Multi-Antenna Systems Employing Chancel and Space-Time Codes," Conference Record of the Thilty-Sixtll Asilomar Conference on Signals, Systems and Computers, vol. 1, Nov. 3-6, 2002, pp. 294-298; and H. Vilcalo and B. Hassibi, "Towards Closing the Capacity Gap on Multiple Antenna Channels", ICASSP'02, vol. 3, pp. llI2385 - III-2388).

iii) W7lere a soft output is needed, providing the soft output based on the soft input and the set of lattice points found in the search region.

An '? -dimensional lattice can be decomposed into (n - 1) dimensional layers. The search algorithm for a ?? dimensional lattice can be described recursively as a finite number of (/? - 1) -dimensional search algorithms. Viterbo and Boutros (ibid) described the search algorithm in teens of three different states, or cases, of the search: Case A The n- tl1 dimensional layer is witllil the search bound: The layer is decomposed into (/?-I)-th dimensional layers.

Case B The search succesfully reaches the zero-dimellsiollal layer and a lattice point in the search region is found Case C The'?-th dimensional layer is not within the search bound: The search moves up one step in the hierarchy of layers.

Table 1

Broadly speaking the lattice search involves selecting a candidate symbol for a string of symbols (vector s), testing an inequality to determine whether the k-th dimensional "layer" is witless the search bound, and if so selecting the next symbol. After a component of a vector s that satisfies the distance metric is found its contribution is subtracted. In this way the search, in effect, constructs a tree with one or more lattice points at the end node(s) ("zero dimensional layer"). The lattice point with the smallest distance metric provides the hard decision output (for that metric).

The search procedure is simplified if the lower triangular matrix U', derived from QR decomposition or Cholesky factorization (sometimes referred to as taking the square root of a matrix) of the channel matrix, is used as the generator matrix. For example, if QR decomposition is used (see, for example, G.H. Golub and C.F. van Loan, Mahix Computations, Jolm Hopkins University Press, 1983), the lower triangular matrix Ur (and upper U) are defined by UrU = HTH.

In UK patent applications 0323211.3 and 0416820.9 we employed a max-logMAP (maximum a posterior) probability) approach by searching for the two candidates that satisfy the max { } tend in Equation 11. The search procedure was performed for every bit x; to find the two candidates that satisfy the following optimisations: s+ - min {||r -sH|| - x LA} for bit x;' = +1 and s - min l||r - sH|| - 2 xT L,'} forbit.=-I,where ''=1, ,'r and j-1, .,q.ThecoTesponding distance metrics were obtained for the two candidates, d,,-J + and d,,-j where d,,2 j + = 1Ir - s+H||2 _ crux+ L+ Equation 13 and d,, j _ =||r-sH|| --X-T- L,, Equation 14 The vectors x+, x and L+,,, L,, correspond to the bit sequences and a priori information of the s,,ymbols s+ and s.

Therefore, the max-log-MAP approximation of the extrinsic LLR (log likelihood ratio) value of a bit is given by: (cJ I r)= 22 (-d,,,},+ +d,, ) Equation IS _, wax - log approximation The relationship between [p and 1 E is given by Lp = A, LO We now describe a system in whicl1, in order to reduce the computational complexity fulfiller, each maximum likelihood (ML) detector cooperates in its derivation of the I 2. - - metucs - , ||r - sH|| + x L,, for a postenon n1fonnaton or - ||r - slI|| hi,, j7 L,',, hi for extrinsic infomlation. For most types of ML detector, the detector perfonns a set of operations iteratively in order to provide its solution. For example, a maximum likelihood sphere decoder iteratively perfonns its search to find the maximum likelihood symbol candidate. Other types of maximum likelihood decoder Chicle may be employed in embodiments of the invention include the branch and bound decoder (LOO, ibid), and the genetic algorithm decoder (Juntti, ibid), the linear BER decoder and the like. In embodiments, a maximum likelihood decoder is able to exploit information from another decoder doling its iterative processing. This sharing of information can be perfonned sequentially or in parallel depending upon the hardware or processor limitations.

Bloclc diagrams sllowhlg example implementations of max-log MAP decoders embodying this type of information sharing are shown in Figures 2 and 3. We will also describe how cooperative ML detectors can share available hardware resources though intelligent scheduling operations.

Refen ing to Figure 2, this shows a block diagram of an embodiment of a max-log MAP decoder 200 configured to determine bit likelihood values in accordance with a max-log approximation, in particular the approximation of Equation 15, in whicl1 ML detectors share information. The decoder 200 comprises a plurality of hard detectors (or decoders) 202a-c, 204a-c, each configured to determine a distance metric d,, +, do- j (for '' = I to '!T; forj = I to q) for a possible value of a particular bit xj', +1 for detectors 202, -I for detectors 204, according to respective equations 13 and 14, based upon input values for r, H. (noise variance) and, where available, a priori information L,, (x) . Each of these detectors 202,204 provides a distance metric valise d,', j +, d,', j to an output stage 206 that determines a bit likelihood value for each bit of tile transmitted string of syrnlo]s, for example according to Equation 15. Tl1e li]celilood values may comprise "extrinsic" and/or a posterior) bit]ikelilood values (in the above notation L(x) or Lp(x)). In the decoder 200 of Figure 2 tle shared information is provided by a common memory area or register 208 to which detectors 202,204 have read/write access, although in other embodiments other means of sharing information may be employed.

As described in more detail later the detectors 202,204 may be implemented in series, for example as repeated instances of a software process, or in parallel, or in embodiments in a combination of serial and parallel processes employing multiple instances of ML detection hardware, but not necessarily one instance per detector 202,204 so that the detection hardware is shared between multiple detectors. The sharing of hardware resources is facilitated by intelligent scheduling of operations, as described further below. Similar approaches may also be employed in tile decoders of Figures 3a and 3b.

The noise variance may be obtained in any convenient manner, depending upon the overall system design. lior example, tile noise variance may be obtained during the traipsing period where cl1amel impulse response is estimated. During tl1e training period, the transmitted symbol sequence is known. Together with tile estimated camel impulse response, the 'noiseless' received signal is obtained. The noise variance may be estimated from evaluating the noise statistic of tl1e sequence of received signal during the 'training period', knowing the sequence of Noiseless' received signal.

A detector 202,204 need only provide a hard output, that is an output identifying a most likely candidate with a particular bit value x,' being +1 or -I and/or providing a miniintm distance metric d,,- j +, or d,,- j. Thus the skilled person will appreciate that the arrangement of Figure 2 may employ any maximum likelihood hard detectors/decoders that can provide the appropriate distance metrics. However in pretested embodiments, hard detectors 202,204 are implemented using one or more sphere decoders.

For the received vector r, either candidate s+ or s is the maximum likelihood estimates,,, - that is the maximum likelihood solution provides one set of bit values xAn and corresponding distance metrics do-, Thus maximum likelihood sphere decoding can be performed first and the bit-wise sphere decoding may then be performed to obtain the distance metrics, d,', =a, for the bit values which do not correspond to the maximum likelihood symbol estimate.

Figure 3a shows a block diagram of a max-log decoder 300 configured to determine bit likelihood values in this way, employing sphere decoders as hard detectors, and in which ML detectors again share information.

In Figure 3a hard detection blocks 304a-c and output stage 306 correspond to a combination of detectors 202 and 204 which correspond to the set of non- maximum- likelihood bit sequence x X,,. and to output stage 20G of Figure 2 respectively. An additional hard detector 302, preferably a sphere decoder, determines a maxi nun likelihood symbol string estimate so,, (for x e X, where X = X,+j U Xn, ) and a demodulator (not shown in Figure 3a) converts this symbol estimate to a bitwise estimate XA/! for bitwise detectors 304 so that each may fix the bit with which it is associated to be the inverse of the corresponding bit in the ML symbol string. Hard detection sphere decoder 302 also provides a corresponding bit likelihood value do-, (common to all the bits of XAl] ) for the max-log MAP calculation Again the detectors may be implemented in series or parallel or in a combination of the two.

The bitwise detectors 304, and preferably also ML detector 302, share information via common register or memory area 308 Figure 3b shows a block diagram of another embodiment of a two-stage max-log-MAP infomlation-slaring sphere decoder 310, in which similar elements to those of Figure 3a are indicated by lice reference numerals. In the decoder of Figure 3b the first stage comprises two maximum likelihood decoders, sucl1 as hard detection sphere decoders, each configured to determine a minimum distance metric for a predetermined bit (the first bit), one of these maximum likelihood decoders determining a distance metric for the predetermined bit (the first bit) having a first logic level, one for the bit having a second logic level (ie. for x e X,+, and for x X,'). Here, a string of symbols corresponding to the shorter distance metric provides the maximum likelihood transmitted symbols - in other words the ML symbol string estimate s,,, output from detectors 302 with d,,f' = min do, + and d,2, } is selected and converted to a bitwise estimate x. The second stage comprises further maximum likelihood detectors, provided for each other (subsequent) bit, each of these maximum likelihood decoders determining a distance metric for a value of the bit different to its value in tile maximal likelihood string of symbols. Again either or both stages of the maximum likelihood detector may be implemented using serial, parallel or serial/parallel processing.

Figure 4 shows a flow diagram 400 of a sphere decoding procedure for implementing a maximum likelihood hard detector such as one of detectors 202,204 of Figure 2 or one of tle detectors of Figure 3a or 3b. The procedure is a modification of a conventional sphere decoding procedure and, in particular of the distance metric calculation at step 402. In a conventional procedure (for example to implement maximum likelihood sphere decoder 302 of Figure 3a) that does not consider the available soft input, the final teml in the update of the distance metric perfonned at step 402 is missing. In the procedure of Figure 4 the function L( ) provides the a priori LLR term in tile distance metric given in Equation 12 for symbol s,,, that is L(s,,,L,,(x,,lcr) = 2x,,tLA,, and xn is obtained from the relationship s,, - map(x,, ).

Referring in more detail to Figure 4, the generator matrix of the lattice lI (F = H-' where F is a triangular matrix and H is pre-processed to be a triangular matrix, for example using QR decomposition) is the lattice representation of the communication system and the received signal is r (pre-processed in the same way as tile generator matrix for the search procedure); hi,, (x) comprises the a priori LLR values and of is the noise variance. Tile outputs of tile procedure are s'', and d 'A where the sphere decoding is performed for the set of bit sequence in the set X, X,,,,t, X,+, or X,,, . The output sin is the lattice input corresponds to the lattice point closest to the received signal r and is the maximum likelihood solution. The output d',, is the distance metrics corresponding to tile lattice inputs s ''L. The output d''c refers to the sphere decoding output do j for x X,+, in 202, d,,- j for x X,, j in 204, d,,-,' for x e X in 302 and d,,, AL for x e X,,. in 304. Tile search region is defined by tile search radius pa. For most applications the variable initial bestdist is assigned to a large value.

The function SortedList(e,,,J provides an ordered set of lattice inputs to be searched according to an increasing distance from e,, ,, and M is the number of lattice inputs (the number of possible symbols in the constellation) to be searched through and is the lengtl1 of the vectorslist,, (slist is a N x M matrix, and steps counts from 1 to M). The ordering may thus be performed using a look up table storing all the possible combinations, for example, using a c x M matrix d> where c = 2M is the number of symbol search combinations. The sorted vector slist for the zero-forcing solution s"" (where the zero-forcing solution s" is given by s" = (HrH) HTr) is given as the i -th row of A, slist = 4(i) where i = |-Sn 1 M - I and 1 denotes rounding towards infinity. The notation slistn,; refers to the ith element of the vector sllstn. Broadly speaking this technique comprises a modified version of the Sclmorr-Eucluler strategy described in Agrell et al. (ibid). The zero-forcing solution at the '' th dimensional search is given by en:= r F. The number of unknowns (length of the string of symbols to be estimated) is N (bearing in mind that where I and Q components are to be estimated there are two unknowns per symbol so that the number of unknowns doubles).

The three Cases A, B and C are as described above; broadly speaking the procedure initialises n = N and examines symbols, preferably in slist order, until all have been examined (examined all is true when al] symbols in slistn have been examined at the th dimensional search), moving up a layer (Case C) when outside the search radius pa and fmislling when back at the top of the tree (wllen n is equal to N).

Methods for ordering tile symbols to be searched using look-up table are described in more detail in A. Wiesel, X Mestre, A. Pages and J. R. Fonollosa, "Efficient Implementation of Sphere Demodulation", Proceedings of IV IEEE Signal Processing Advances in Wireless Communications, pp. 535, Rome, June 15-18, 2003, which is hereby incorporated by reference.

We will next describe the sharing of information in the above decoders. The precise infonnation that is shared depends upon the algorithm(s) employed by detectors 902, 204, 302, 304 in the decoders of Figures 2 and 3. However it is normally advantageous for an ML detector to share with other detectors information specifying the searched (strings of) symbols and the corresponding distance metrics, so that this information can be exploited by the other detectors to reduce their search space. We will particularly describe an example of such infonnation sharing when the ML detectors comprise sphere decoders in whicll, as previously described, the search radius determines the search space (amount of searching required) and hence also the computational complexity.

As an example we take tile case of a spatially-multiplexed transmission system with two transmit antennas and a 4 PAM (Pulse Amplitude Modulation) symbol constellation, C - {-3,-],1,3}, corresponding to the symbol mapping of the bits {-I -I, -I +1, +1 +1, +1 -1}, that is: map({l, 1)-+3, map({l, -I 3)=+1, map({-l, -1})=-1, map({-l, 1})=-3.

There are two bits per symbol, 'IT= 2, and thus 2 x 2 = 4 bits to jointly detect.

Now consider, say, the decoder of Figure 3a. Assume that the ML symbol detector 302 detects an ML string of symbols Staff, = {+3, 1}corresponding to the bit sequence XlfL = {+1, 3-1, 4-1, -I}, and leaving distance metric doff. In this case the bit-wise ML detector 304a for the first bit "detects" (searches for) an ML (string of) symbols with the first bit set to the inverse of its value in X,f',, that is -] ( X e X,, ), to obtain d,2, a,, = d,2, . The other ML detectors 304 perform bit-wise sphere decoding for the sets X, I, X,,, and X+, respectively, to obtain d'3 e,.' = did, d'-, cart, = d and d''2.e,'l = d: +; d'-'+ - do + = rl + = cI, = fl,,l.

We assume that the set of ML detectors 304 is implemented in parallel. The bit-wise detector for the ith bit (i- 1, ..., qnr) is constrained by the requirement that the ith bit have a fixed valise, in this case the inverse of its value in the ML string of symbols sisal.

Let us consider the first bit-wise ML detector 304a (i = 1). For the above example where s,a = {+3, +1}, xA.n = {+1, +1, +1, -I}, the value of the lath bit is constrained to be -1. Let us assume that a candidate symbol string found by the first ML detector 304a is {-1, -3}, corresponding to the bit sequence {-1, -1, -1, +1} (in whicl1 it can be seen that the first bit has value -1), and having a distance metric of 3.01. Now let us assume that at the same time the bit- wise ML detector for the second bit has found a candidate symbol string of {+1, +3}, corresponding to the bit sequence {+1, -1, +1, +1} (in which it can be seen that the second bit has value -1, the inverse of its value in X,,), and having a distance metric of 5. l. It can be seen that the candidate symbol string {-l, -3} found by the first ML detector has a bit sequence {-1, -1, -1, +1} in W]liC]l, COincidentally, the second bit has a value of-1, so that this is also a candidate symbo] string for the second bit-wise ML detector. Moreover the distance metric of 3.01 for symbol string {-1, -3} is less than the smallest found so far by the second bit-wise ML detector (since the sphere decoder search navigates the search tree in a direction which progressively reduces the search radius). The radius of the second bitwise ML detector is therefore updated to 3.01 (instead of 5.1), and this allows the search sphere to close in more rapidly to the constrained ML solution than it would otherwise. If tile second bit- wise ML detector is started at a later time than the first bit-wise ML detector, or after the first bit-wise ML detector has completed its task, the second bit-wise ML detector may be initialised using the shared information froth the first bit-wise detection task.

Figure 5 shows a schematic illustration of the above process, in which each bit-wise ML detector provides distance metrics and corresponding sequences of bits (for the candidate symbols found), which can then be employed to update the sphere radii for the bit-wise ML detectors where a candidate meets the required bit value constraint (reversed from the ML string of symbols) for a detector and has a smaller radius than any candidate yet found by the detector. In other words the sphere radius of the ML detector for bit i is updated when the new distance metric is smaller thaul that reached so far provided that the ith bit is tl1e reverse of the ith bit in tile ML symbol string. Thus in this example the shared information comprises bit sequences and corresponding distance metrics found by the bit-wise ML detectors, and this shared information is used to update the sphere radius of one or inore of the bit-wise ML detectors.

llefen-ing back to Figure 4, the sphere decoder search procedure can be updated with tile new distance metric estimate by assigning the new value rl,, I Elf found by the other bit wise ML detector(s), to the radius pi if d,2, -3,6< p7, i.e. p2 = mint do j =6,t, p2), Also, with the knowledge of the sequence symbols that have been searched where the node of the branch is Clit off from the tree search by other ML detectors, the list of symbols to be searched slistn is updated to exclude the eliminated node. The search is not generally continued with the symbol string corresponding to the updated distance metric because different bit-wise ML detectors perform searches from different subsets X', and if the search continued it could exclude a symbol string not part of the subset of the other bitwise detector.

Where the ML detectors are implemented by a search algorithm which involves stepping through a list of possible candidate symbols, for example where the ML detectors are implemented by sphere decoders, an ML detector can skip the candidates already searched by other ML detectors, in order to reduce computational complexity.

This is facilitated by searching the sequence of symbols in a den ned order. For example the sequence of symbols can be searched in an increasing order (in the above example, -3, -1, +1, +3) and thus an ML detector can slcip symbols having values less than those already searched by other AIL detectors, since these nodes in the search tree will already have been examined.

In the Figure 3a and 3b arrangements, in order to further increase the speed of the bit wise sphere decoding, the initial search radius of the bit-wise sphere decoding can be determined after obtaining the maximum likelihood distance metrics d,,' . For example, the initial sphere radius of the bit-wise sphere decoder may be set to p,'2 i'';rin' = 20 |LP 1,, r + daft,, where |LPI \' r is the maximum extrinsic LLR magnitude required by an application, for example between 5 and 50. Altennatively the list of candidates {g', 7, , i p} searched by tile maximum likelihood sphere decoder, together With their distance metrics {d,2, rig-, , dp}, can be used to set or provide an additional constraint for the initial sphere radius p,7, j;/ia' of the bit-wise sphere decoders if d'- < pi nrn'' i=1, ,P.Mappingoftheinitialsphereradius pn2; infirm', i=1, ,l7T, j = 1, , q, to the distance metrics {d,-, d7-, , rip} where dj2 < p,7, J i,irin' is performed.

For example where maximum likelihood sphere decoding is performed, the symbol set {s', s:, , so, } is searched and solution for bit x'2 is found to be -1. The subset of the searched symbols {s,, as, g'} is found to have the bit x; = +1 and the corresponding distance metrics are found to have the relative magnitudes d5 < pa, ,,,,on' < d' < d' . The initial radius for bitwise sphere decoding of bit x'2 is therefore set as pa, ,,,irin' = d57, subject to any better starting value (which may arise where, for example, another bit-wise detector has already found a better solution). In a further alternative approach the sphere radius of the bit-wise sphere decoder is set to the distance metric of the maximum likelihood bit sequence with the sign of the bit to be detected inverted or flipped tin jpjppt) so that Pn jnrin' = ||r - g,, fljpp,!,iH|| - -X7 LA where x and LA are the bit sequence and the corresponding a priori LLR vector for Sr', ,}li7pcf ' We next describe arrangements for making effective use of available hardware when implementing the above described decoders.

Some types of ML detector/decoder, for example a sphere decoder, have a variable complexity - that is an ML candidate can be found after a variable number of operational iterations (further iterations then being umecessary). However in a practical receiver the total number of available operations (for example processor operations or instruction cycles) is generally limited by tl1e processor speed and data transmission rate. Thus tl1e number of operations available for each ML detector is generally limited or bounded. However some of the ML detectors will need more operations than this limit to obtain the ML candidate, whereas others will require fewer operations.

One approach to this problem has previously been described in the Applicant's co- pending UIC patent applications no. 0323208.9 and 0416823.3 filed on 3 Oct 2003 and 28 July 2004 respectively, the contents of which are hereby incorporated by reference.

Broadly speaking the ML detection (search) process is bounded and, in particular, candidate symbol searching in the sphere decoding process is stopped after a (predeten1lined) limiting number of symbols have been examined/distance determinations have been made. This is useful in a hardware implemertation such as an FPGA or VLSI implementation, and in a software implementation for example on a DSP, as it allows a designer to know that a result will be available after a particular number of operations or clock cycles (or time).

Here we describe a system for efficiently stacking bit-wise ML detectors onto hardware comprising P parallel processors each configured (or configurable) for performing an ML detection task, in particular where tle number of data processors is less than a number of bit-wise ML detectors required to be implemented. In decoder embodiments nTq bit-wise ML detectors are required to obtain a soft output at one signalling instant, Flat is over a time duration corresponding to that of a string of symbols to be decoded.

Figure shows a timeline illustrating how these ML detection operations may be allocated to tile ML detection data processors under these conditions. Each ML detection task comprises a period of parameter initialisatiorl followed by a period of search operations, and the ML detection tasks are stacked so that an ML detector that completes its search before the maximum Lumber of operations permitted immediately begins ML detection for the next bit allocated to it (for example tle (P+ i)-th bit). In embodiments this may be implemented by updating registers that store the operating parameters of the bit-wise ML detector.

Wllere an ML detection task is not completed before the maximum number of operations permitted it may be truncated. In some cases, some of tle ML detection tasks are abandoned when the total maximum number of operations is reached for that signalling instant. The approximate LLRs for those corresponding bits where the ML detection tasks are abandoned may be obtained front tile distance metrics found by the other completed ML detection tasks. The order of the bit-wise ML detection may be intelligently selected, for example in accordance with an order of required soft output reliability for tile bits. For example bits requiring higl1 reliability may be started on a processor which finishes its previous detection task before tile permitted maximum number of operations (to allow extra time for detection), or the ML detection task for the bit may be started after at least some of the other ML detection tattles, so that advantage may be talcen of the results of these, or a higher permitted maxinnwn number of operations may be allocated to the ML detection task, or some other prioritization may be employed. In general the soft output reliability Will depend on the assistance of the other bit-wise ML detector tluough the shared information and resources.

Figure 7 shows the stacking of ML detection operations in a max-log-MAP decoder as described with reference to Figures 3a and 3b, in which the decoder has two stages of operations, a first stage in which ML symbol string detection is performed, and a second stage in which bit-wise ML detection is performed (using a set of '7T] ML detectors for Figure 3a and ''rq -1 ML detectors for Figure 3b to determine a bit-wise distance metric for each bit inverse of the ML symbol string). As can be seen fiom Figure 7, implementation of these two stages can overlap and, in particular, stage I detection for one signalling instant can overlap stage 2 detection for the previous signalling instant (in Figure 7 the end of the second row and the start of the third row in this figure both have stage I blocks; the second row is for k+2 and the start of row three is for k+3). For simplicity in Figure 7 tile dividing lines between the first and second stages within a signalling instant are shown as straight but in practice, as can be seen in Figure 6, there need not be a shard dividing line.

An ML detector algorithm as described above may be implemented using hardware comprising multiple pipeline stages, each pipeline stage comprising a processing element which executes the calculations corresponding to a specified step or steps of tile algorithm. Here, the processing of the received signal from multiple signalling instances or different subcarriers for OFDM system Nay interleave, i.e. different pipeline stages may perfonn tile calculations to process the received signal front differert signalling instances or different subcalibers. Here, one can view the processing of different ML detectors as being stacked through the pipeline stages in the concurrent processing.

In the parallel stacking arrangements of both Figures 6 and 7 tile sharing of information between ML detectors facilitates efficient packing of the ML detection tasks and indirectly this mecllaTlisn improves the quality of the derived soft output when tile number of operations allowed is limited In the above decoders shared use can also be made of the bit LLR values. In the parallel stacking arrangements of Figures 6 and 7 soft bit output (LLR) values for the different bits will in general become available at different times and thus the available soft output from the i th bit-wise ML detector may be used by another j th ( i j) ML detector to update its derivation of LLR values according to Equation I I. In most cases, the r pliability of the detected bits improves at the output of the detector, i.e. ||LE || > ||LA || tllus, it is desirable to use a more reliable estimate of the itll transmitted bits when it is available for the j th ( i j) bit-wise ML detection.

It will be appreciated that the above described parallel implementation techniques are not limited in their application to the use of sphere decoders detectors. For example in one embodiment of the system tile detector for the ML symbol string is implemented using a sphere decoder (which generally provides an output will, a higl1 accuracy/reliability) whilst the bit-wise ML detectors are implemented using a less complex (but potentially also less accurate) algorithm, such as a genetic algoritln.

Figure 8 shows curves of bit error rate against signal-to-noise ratio (Eb/No), comparing list sphere decoders (LSD) (according to tl1e Applicant?s earlier UIC patent applications no. 0323208.9 and 0416823.3 (ibid) and to US200310076890) and embodiments of Max-log-MAP sphere decoders (MLMSD) implemented as described above. The curves relate to the half-rate turbo-coded BER simulated performance of max-log-MAP sphere decoders (MLMSD) and list sphere decoders (LSD) with fixed computational complexity. Tile list sphere decoder here only searches for 5 candidates closest to the received signal. The simulation was perfonned for an ulcoTelated Rayleigh fading charmer and 4x4 16QAM scheme transmissions.

The Max-Log-MAP sphere decoder was implemented with two parallel ML sphere decoder (i.e. P=2) With stacked operations as shown in Figure 6. The number in the legend, it, in MLMSD'' and LSDn, signifies tile number of distance metrics operations limited/bounded in one signalling instant. Thus curves 800, 802, 804 and 806 relate to MLMSDs limited to, respectively, 100, 150, 200 and 250 distance metric operations and curves 810, 812,814,816 and 818 relate to LSDs limited to, respectively, 100, 200, 300,400 and 500 distance metric operations per signalling instant. An MLMSD gives the soft output zero-forcing solution if the ML detectors that provide tl1e ML space-time symbols have not completed tile bounded tree search. Curve 820 shows the non-limited max-log MAP performance. A two-parallel MLMSDIOO decoder 1las a total of number of 200 distance metric operations, which is similar to LSD200. In Figure 8 curves with the same marlcer (for example 800 and 812) have a similar maximum computational complexity.

From Figure 8 it can be seen that embodiments of co-operative max-log-MAP sphere decoders as described above provide improved performance compared to a list sphere decoder, and at the same time provide the advantage of a robust implementation.

Degradation of the performance from max-log MAP detection depends on the limitation on the number of operations, which is generally limited by the hardware. The degradation will also depend on the overall implementation of the ML detector components, that is on how the operations are stacked and the information shared. The performance of the MLMSD improves with higher levels of parallelism.

Figure 9 shows a receiver 900 incorporating an embodiment of a decoder in accordance with the invention.

Receiver 900 comprises one or more receive antennas 902a, b (of which two are 5110Wn in the illustrated embodiment) each coupled to a respective rf front end 904a,b, and thence to a respective analogue-to-digital converter 906a,b and thence to a digital signal processing (DSP) system 908 implementing the decoder.

As illustrated the digital signal processing system 908 comprises a plurality of data processors 910a-c each coupled to respective working memory gl2a-c and to permanent program memory 914a-c; a control processor 914 is also provided to control the distribution of infonnation and/or tasks between processors 910 and, conveniently, to provide input/output and other shared functions. Processor 914 is coupled to an optional shared memory buffer 916 to facilitate the sharing of information between processors 910 (this may not be needed in some configurations of processors 910).

Processor 914 is also coupled to permanent program memory 918. The contents of memory 914 and/or 918 may be provided on a carrier such as an optical or electrical signal carrier or, as illustrated, on a storage medium 920. Tile digital signal processing system 908 has a decoded baseband data output 922, for example to a baseband data processor (not shown) configured for implementing higher level protocols.

As illustrated program memory 914a-c stores instances of bit-wise ML detection code.

Program memory 918 stores decoder code 918a comprising ML symbol string decoder code (for example sphere decoder comprising code to generate a lattice from a matrix channel estimate, and tree building/searclling code) , demodulator code, max-log MAP calculation code, and task distribution code, which code is loaded and implemented by control processor pl4 to provide the corresponding functions as described above.

Program memory 918 also includes MIMO channel estimation code 918b to provide a MIMO channel estimate If, and, optionally, de-interleaver code 91 8c, interleaver code 91 8d, and channel decoder code 91 8e. Implementations of de-interleaver code, inierleaver code, and channel decoder code are well known to those skilled in the art.

The receiver front-end will generally be implemented in hardware whilst the receiver processing will usually be implemented at least partially in software, although one or more ASICs and/or FPGAs may also be employed. Tl1e skilled person will recognise that al] the functions of the receiver could be performed in hardware and that the exact point at wl1ic]l the signal is digitised in a software radio will generally depend upon a cost/complexity/power consumption trade-off.

Figure 1 Oa shows an example configuration of processors 910, 914 in the DSP system 908 of Figure 9, in W]liCh like elements to those of Figure 9 are indicated by like reference numerals. The system includes a 'farmer' processor 914 and multiple worker' processors 9lOa-c. The 'farmer' and the 'worker' processes interact through a connector 1000 flat buffers the list of tasks and stores the results of tl1e tasks. The responsible of the 'farmer' processor includes generating a set of tasks au1d queuing, monitoring and prioritizing the tasks for the 'worker' processors. Additionally, the farmer' collects the results and determines when the computation has finished. Each of the worker repetitively takes a task from the buffer 100O7 computes the result for that task, and places the result in the buffer 1000. The fanner processor 914 implements a queuing system wl1icll controls data input and output for processors 910 assisted by monitoring and prioritizing facilities to provide efficient use of processor elements 910 and of processor time. Techniques for implementing a queuing, monitoring and prioritizing system using farmer processor 914 are well known to those skilled in the art of load distribution in parallel data processing multiprocessor systems.

Figure 1 Ob slows flow diagrams of a task distribution system for embodiments of tl1e above described decoder. More particularly Figure lob sl1ows procedures for control units implementing the task distribution/processing of tile farmer and worker processors. The functions outO, in(), read(), respectively, correspond to depositing a tasl, result or command into the task buffer, removing the task, result or corrunand from the task buffer and reading the task, result or command from the task buffer. Monitoring the tasks preferably involves obtaining information from tile worker processor and task buffer SUCH as the number of tasks in the queue, the priority of the task and the shared information from the worker processors. The queuing function preferably involves reordering or prioritizing the tasks in the buffer or allocating tasks to a specific worker processor according to the new information obtained.

Additionally or alternatively embodiments of the above described decoder systems may be provided as a signal processing module. Such a signal processing module may implement any of a variety of functions including, but not limited to, a cllanulel decoder, a soft-in/sofl-out space-time detector/equaliser, and a multi-user detector.

Figure 11 shows a block diagram of a transmitter with concatenated channel encoders; the frequency selective channel can be considered to be an 'encoder'. In Figure l l Coder 2 may comprise a conventional channel encoder and Coder l an STBC coder in combination with the channel.

Figure 12 shows a block diagram of a receiver with concatenated chamel decoders or detectors, suitable for use With the transmitter of Figure I l. In Figure 12 detector or Decoder I may comprise a max-log decoder as described above, and Decoder 2 a conventional channel decoder. Figure l 3 shows a block diagram of a variant of the receiver of Figure 12, with concatenated decoders or detectors employing iterative or "turbo" decoding. Figure 14 shows a block diagram of a receiver comprising two instances of Decoder I, which may comprise, for example, a space-time decoder. In Figure 14 the outynt of one decoder provides a priori knowledge for the other decoder.

In this way the decoder component iteratively exchanges soft infonnation in effect with itself to improve the reliability of the detected data. The received signal is provided to both decoders, optionally (depending upon the interleaving arrangement at the transmitter) interleaved in one case.

Embodiments of the invention have applications in many types of communication system, including MIMO and multiuser systems, for example for wireless computer or phone networking. In multiuser systems, for example, the generator matrix or erquivalent channel matrix may represent a combination of spreading and channel effects for the users (see, for example, L. Brunel, "Optimum Multiuser Detection for MC-CDMA Systems Using Sphere Decoding", 1 2th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Volume 1, 30 Sept.-3 Oct. 2001, pages A-16 -A-20 vol.l, hereby incorporated by reference).

In other applications the decoder can be applied as a block equaliser for frequency selective fading. Here, the channel model of Equation 5 may be modified to take into account the channel memory as shown below: r - SH + V where HI H1 H! H. _ '. IIL

_ HI

HL

Hi HL-I HL r = [I; rl';+L-I]' S = [5! S1 ' ' ' Sr] V = [/1 121 r+L_I] and where T is the length of the symbol block being equalized and H;, f - 1, , L, is the i th MIMO channel tap. The procedure may then be employed to detect the transmitted block s.

Broadly speaking, any type of detector may be employed in the above described systems as long as it is able to provide a maximum-likelihood solution and a con esponding distance metric - sphere decoders are merely used by way of example.

Embodiments of the invention can be applied as a channel decoder where the channel encoder can be represented by a linear generator matrix G. Examples are bloclc channel codes (see "Digital Communications: Fundamentals and Applications", Bernard Slclar, Prentice Hall International Editions, 1999, 0-13-212713-X) such as Hamming code and Linear Density Parity (:IIleck (LDPC) coding where the codeword x is generated by the generator matrix G from the information bits s through x = sG, where the vector s contains the information bits. For LDPC code, for example, the generator matrixG is derived from the parity check matrix H to fulfil the orthogonality requirement GHr = 0 and any legitimate codeword wild satisfy the condition xHt = 0. Here, the information and codeword blocks, s and x, respectively, are comprised of binary digits, i.e. I and 0, and the matrix operations are in a binary field. In an example implementation, a sphere decoder with input r and using G as the generator matrix, determines the distance between the received signal r and each of the possible transmitted codewords in its search. The codeword with the minimum distance is the maximum likelihood codeword. This approach employs a translation of the information and codeword blocks from a binary field, {0,1} to signed values {-I, +1}, and aritlunetic operations are then used.

TO skilled person will appreciate that the above described teclmiques may be employed for example in base stations, access points, and/or mobile tem1inals/phones and may be embodied in a wireless chipset or signal processor. The techniques are applicable in 3G and 4G digital mobile communications systems, wireless local and personal area networks and in many other types of communication system. Broadly speaking embodiments of the invention facilitate cheaper receivers without a loss of performance, or equivalently increased data rates Without correspondingly increased complexity and cost. Embodiments of the invention may also potentially find application in non-radio systems, for example a disk drive witl1 multiple read heads and multiple data recording layers in effect acting as multiple transmitters.

No doubt many other effective alternatives will occur to the skilled person. It will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.

Claims

CLAIMS: I. A decoder for decoding a received signal, said received signal

being provided by one or more transmitted signals deeming a string of symbols, each said symbol comprising one or more bits, said decoder COUIpliSing: a plurality of maximum likelihood (ML) detectors each configured to determine a minimum bit-dependent distance metric for a r espective candidate string of symbols in W]liC]1 a bit 1las a defined value, said distance metric being dependent upon a distance of said received signal from an estimated received signal determined from said candidate string; and a bit lilcelilood estimator to receive a said minimum distance metric from each of said ML detectors and configured lo determine a bit likelihood value for each bit of said transmitted string dependent upon said minimum distance metrics; the decoder further comprising: a plurality of data processors, each configured to implement an ML detection task for a said MI, detector; and a taslc distribution system configured to distribute a plurality of said ML detection tastes for said plurality of ML detectors amongst said plurality of data processors.
2. A decoder as claimed in claim 1 wllereiu a said ML detection taslc 1las a variable duration, wherein a said taslc is controlled to limit said taslc duration, and wherein said taslc distribution system is configured to allocate said ML detection tasks to said processors SUC]1 that said distance metric determinations by said ML detectors are substantially performed within a time duration of said transmitted string of symbols.
3. A decoder as claimed in claim 2 wherein said task distribution system is configured to allocate said ML detection tasks to said processors such that at least one of said processors implements two of said ML detections tasks within said time duration.
4. A decoder as claimed in claim 3 wherein said task distribution system is configured to allocate said ML detection tasks to said processors such that each of said processors implements two of said ML detection tasks within said time duration.
5. A decoder as claimed in any one of claims 1 to 4 further comprising a maximum likelihood detector configured to determine a common, maximum likelihood distance metric for each bit of a maximum likelihood string of symbols.
6. A detector as claimed in claim 5 wherein said task distribution system is configured to allocate detection taslcs to processors in two stages, said stages comprising a first stage in which a task for said common, maximum likelihood distance metric determination is allocated and a second stage in which at least some of said ML detection taslcs are allocated.
7. A detector as claimed in claim 6 wherein said first and second stages overlap in time such that said second stage for a first said string of symbols is partially concurrent with said first stage for a second, subsequent said string of symbols.
8. A detector as claimed in any one of claims 1 to 7 wherein said task distribution system is configured to distribute said ML detection tasks in an order dependent upon a desired reliability of an associated said bit likelihood value.
9. A detector as claimed in any one of claims 1 to 8 comprising a farmer processor for implementing said task distribution system.
10. A detector as claimed in claim 9 further comprising a task buffer to couple each of said plurality of data processors to said farmer processor.
11. A detector as claimed in any preceding claim wherein at least one of said ML detectors is configured to share infonnation from said minimum distance metric determination with at least one other of said ML detectors.
12. A method of decoding a received signal, said received signal being provided by one or more transmitted signals defining a string of symbols, each said symbol comprising one or more bits, the method employing a plurality of detectors one allocated to each bit of said string, the method comprising: determining, for each bit of said string using a detector allocated to the bit, a minimum bit-dependent distance metric for a respective candidate string of symbols in which a bit has a deemed value, said distance metric being dependent upon a distance of said received signal from an estimated received signal determined from said candidate string; and determining a bit likelihood value for each bit of said transmitted string dependent upon said minima distance metrics; and wherein the method further comprises: implementing said bit-allocated detectors using a plurahty of data processors, a number of said data processors being less than a number of said detectors; and controlling the distribution of said distance metric determining by said detectors to distribute said determining amongst said plurality of data processors.
13. A method as claimed in claim 12 wherein said distance metric determining has a variable duration, the method further comprising limiting said duration of said distance metric determining.
14. A method as claimed in claim 13 further comprising controlling said distribution such that said determining is substantially perfonned within a duration of said transmitted string of symbols.
15. A method as claimed in claim 14 wherein said controlling comprises implementing at least two of said detectors on a said data processor within said symbol string duration.
16. A method as claimed in claim 15 wherein said controlling comprises implementing at least two of said detectors on each said data processor within said symbol string duration.
17. A method as claimed in ally one of claims 12 to 16 wherein said determining includes determining a common, maximum likelihood distance metric for each bit of a maximum likelihood string of symbols.
18. A method as claimed in claim 17 wherein said controlling comprises controlling said distance metric determining such that said determining of said bit-dependent distance metrics for a first string of said symbols is performed partially concurrently with said determining of said common distance metric for a second, subsequent said string of symbols.
19. A method as claimed in any one of claims 12 to 18 wherein said bitdependent distance metric determining for said bits of said string is performed in an order dependent upon a desired reliability of a bit likelihood value for a said bit.
20. A method as claimed in any one of claims 12 to 19 further comprising Clearing infonnation fiom said distance metric determining between at least two of said ML detectors.
21. Processor control code to, when running, implement tile method of any one of claims 12 to 20.
22. A canter carrying tile processor control code of claim 21.
23. A receiver or decoder including the carTier of claim 22.