GB2446659A

GB2446659A - Controlling LLL Lattice reduction runtimes in wireless MIMO receivers

Info

Publication number: GB2446659A
Application number: GB0703184A
Authority: GB
Inventors: Henning Vetter
Original assignee: Toshiba Research Europe Ltd
Current assignee: Toshiba Europe Ltd
Priority date: 2007-02-19
Filing date: 2007-02-19
Publication date: 2008-08-20
Anticipated expiration: 2027-02-19
Also published as: GB0703184D0; GB2446659B

Abstract

In a wireless Multiple Input Multiple Output (MIMO) receiver of Space Time Block Coded (STBC) data via several antennas 27, a Lenstra Lenstra Lovasz (LLL) lattice reduction algorithm in a decoder 30 (Figs. 4 and 6) is improved by applying Length Reduction (LR) or Column Exchange (CE) operations (Fig. 7) to the estimated channel matrix H in a looping pipeline (40, Fig. 6), which allows runtime to be determined in advance more accurately in comparison with prior LLL algorithms for which the required number of WHILE loops is unpredictable. The number of processing steps depends on the constellation employed in the transmission (Fig. 1), with an input matrix being presented and re-presented to the processing pipeline either a predetermined number of times or as long as lattice reduction remains incomplete (Fig. 6). Lattice reduction converts channel matrix H (estimated from pilot signals) into a more quasi-orthogonal matrix < >H having shorter base vectors, which is used to correct quantized detected values <^>z and build a table of candidate signals c, from which the best candidate signal x is finally selected, this being the receiver's best guess at the signal which was originally transmitted.

Description

Wireless Communication Apparatus The present invention is concerned

with a wireless communications apparatus, and particularly apparatus configured to receive and decode a signal transmitted from a MIMO transmitter. The invention is particularly, but not exclusively, concerned with complexity reduction in a MIMO receiver.

Many communication systems can be represented by the following system: yHx+v where y is the n-by-I received signal vector, H is the n-by-rn channel matrix, x is the m-by-I transmit symbol vector, v is the n-by-i noise vector and where m and n denote the number of transmit and receive antennas respectively. One common example of such a system is MIIvfO communication, where in and n are greater than 1. Another example application could be multi-user detection in CDMA systems.

Recent publications have demonstrated how the use of Lattice Reduction can improve the performance of MIMO detection methods. For example, "Lattice-Reduction-Aided Detectors for MIMO Communication Systems", (H. Yao and G.W. Womell, Proc. IEEE Globecom, Nov 2002, pp. 424-428) describes Lattice-reduction (LR) techniques for enhancing the performance of multiple-input multiple-output (MIMO) digital communication systems.

In addition, "Low-Complexity Near-Maximum-Likelihood Detection and Precodmg for M1MO Systems using Lattice Reduction", (C. Windpassinger and R. Fischer, in Proc. IEEE Information Theory Workshop, Paris, March, 2003, pp. 346-348) studies the lattice-reduction-aided detection scheme proposed by Yao and Wornell. It extends this with the use of the well-known LLL algorithm, which enables the application to MIMO systems with arbitrary numbers of dimensions.

"Lattice-Reduction-Aided Receivers for MJ1vIO-OFDM in Spatial Multiplexing Systems", (I. Berenguer, J. Adeane, I. Wassell and X. Wang, in Proc. mt. Symp. on Personal Indoor and Mobile Radio Communications, Sept. 2004, pp. 1517-1521, hereinafter referred to as "Berenguer et al.") describes the use of Orthogonal Frequency Division Multiplexing (OFDM) to significantly reduce receiver complexity in wireless systems with Multipath propagation, and notes its proposed use in wireless broadband multi-antenna (MIMO) systems.

Finally, "MMSE-Based Lattice-Reduction for Near-ML Detection of MIMO Systems", (D. Wubben, R. Bohnke, V. Kuhn and K. Kammeyer, in Proc. JTG Workshop on Smart Antennas, 2004, hereinafter referred to as "Wubben et al.") adopts the lattice-reduction aided schemes described above to the MMSE criterion.

Furthermore, "Lattice reduction aided pre-coding" (C. Windpassinger, R. F. H. Fischer and J. B. Huber, IEEE Transactions on Communications, Dec. 2004 Vol 52, Issue 12, pp 2057 -2060) describes a way in which the above teachings can be applied to MIMO pre-coding. In particular, in the latter referenced document, a precoding scheme for multi-user broadcast communications is described, which aims to avoid a fully developed sphere decoder implementation, due to the consequent complexity, but also does not fall back to an approximate closest point solution which leads to a sub-optimal result.

This technique makes use of the concept that mathematically, the columns of the channel matrix, H, can be viewed as describing the basis of a lattice. An equivalent description of this lattice (a so-called reduced basis') can therefore be calculated so that the basis vectors are close to orthogonal. If the receiver then uses this reduced basis to equalise the channel, noise enhancement can be kept to a minimum and detection performance will improve (such as, as illustrated in figure 5 of"Wubben et al."). This process is as follows: and H,. are defined to be the real-valued representations of y, x and H respectively, such that: IRe(y) 1 [Re(x)1 FRe(H) -lm(H) yI I x=i I H=i L1m(y)J Llm(x)i, T LIm(H) Re(H) where ReO and ImO denote the real and imaginary components of their arguments.

It will be appreciated that this is illustrative, and a description of a complex example is set out in "Berenguer et al" A number of lattice reduction algorithms exist. Any one of them can be used to calculate a transformation matrix, T, such that a reduced basis, r' is given by =H,T The matrix I contains only integer entries and its determinant is +1-1 and thus called a unimodular matrix.

One suitable lattice reduction algorithm is the Lenstra-Lenstra-Lovasz (LLL) algorithm referred to above, which is disclosed in Wubben et a!., and also in "Factoring Polynomials with Rational Coefficients", (A. Lenstra, H. Lenstra and L. Lovasz, Math Ann., Vol. 261, pp. 515-534, 1982, hereinafter referred to as "Lenstra et al."), and in "An Algorithmic Theory of Numbers, Graphs and Convexity", (L. Lovasz, Philadelpia, SIAM, 1980, hereinafter referred to as "Lovasz").

Lattice reduction re-expresses the mathematical description of the communications system as: Yr H,.X,. + V,.

= HrTT'Ir + v, = + v,.

= lntz + v where z,. = The received signal, Yr, in this redefined system can then be equalised to obtain an estimate of z1. l'his equalisation process could take a number of forms such as linear zero forcing (ZF) or minimum mean-square error (MMSE) techniques, or more complex successive interference cancellation based methods, of which various examples are presented in the art.

For example, if the ZF criterion is used, then = (hlr'firY'firYr Since fir is close to orthogonal, r should suffer much less noise enhancement than if the receiver directly equalised the channel H1.

Assuming that the transmitted symbols contained in x are obtained from an M-QAM constellation, can then be quantised as: Ir =aQ{!(r_Tu1fl)}+T1fl where Q{ } is the quantisation function that rounds each element of its argument to the nearest integer, and where 1 is a 2*mbyl vector of ones. The scalar values a and (3 are parameters of the constellation, a is equal to the minimum distance between two constellation points while (3 is the minimum distance between constellation points and the hR axes. Apart from the quantisation function, the remaining operations are a result of M-QAM constellations being scaled and translated versions of the integer lattice. It therefore follows that the integer quantisation requires the same simple scaling and translation operations.

Figure 1 illustrates an example of a suitable constellation for use with M-QAM. In this case, 16-QAM is employed. The scalar values a and (3 defined above are illustrated for reference. It will be appreciated that the illustrated example is but one of various constellations that can be used.

Finally, the estimate of x can be obtained as 1, = TIr Occasionally, if errors are present in the estimate of 1., then it is possible that some of the symbol estimates in i,. may not be valid symbols. In such cases these symbols are mapped to the nearest valid symbol. For instance, in the illustrated example in figure 1, the values +1-i, +/-3 define the valid entries in r* Therefore if a component of 1,.

were, for example, equal to +5, then this would be mapped to a value of +3.

Although lattice reduction can be implemented in several ways, the most common method is the Lenstra-Lenstra-Lovasz (LLL) algorithm. The fundamental structure of the LLL algorithm will now be summarised; a more detailed description can be found in Berenguer et al. and Wubben et al. referred to previously.

The algorithm is applied to an mx n channel matrix, II, which is subjected to QR decomposition such that H = QR: INPUT: Q,R,P (default P = Im) OUTPUT: Q,R,T (1) Initialisation: Q Q,k = R,T = P (2) k=2 (3) while k =m (4) forl=k-1,*.,l (5) u = (R(1,k)/ (i,i)) (6) ifp!=O (7) R(i:l,k)_-(i:/,k)-i.&(i:1,1) (8) T(;,k)=T(:,k)-pT(:i) (9) end (10) end (11) ifS(k-l,k-l)2>.(k,k)2+(k-l,k)2 (12) swap columns k-i and kin Rand T (13) calculate Givens rotation matrix 0 such that element (k, k -i) becomes zero: a-(k-1,k-1) (a b" 1-1:k,k-1 0=1 Iwith L-b a) b-R(k,k-1)

-

(14) (k-l:k,k-1:m)= (k-1:k,k-1:m) (15) Q(:,k_1:k)=Q(:,k_1:kT (16) k=max{k-1,2} (17) else (18) k=ki-1 (19) end (20) end It will be noted that, as first used in line 5 in the above routine, the notation (x) denotes the nearest integer to x.

It should also be appreciated by the skilled person that, in the example provided in Wubben et a!., ö = 3/4, and that complex versions of lattice reduction also exist, for example as discussed in Berenguer et al..

The LLL algorithm has two distinct parts. The first part, known as length reduction (LR) is set out from lines 4 to 10, while the second part, known as "column exchange" (CE) runs from lines 11 to 19. The abbreviation "LR/CE" represents one process in which one LR and one CE are run. This is, when using the LLL algorithm as described above, this reflects execution of one WHILE loop from lines 3 to 20.

The structure of this algorithm in implementation is illustrated in figure 2. This figure illustrates the execution flow of the original LLL algorithm. The example used is a case wherein the input matrix R of the equation II = QR has m = 4 columns, but it will be recognised that any other number m = 2 is possible for real and complex matrices.

After each "LRJCE" (i.e. "while loop") process the counter k is either increased (line 18 of the process) or decreased by one (line 16 of the process), subject to k=2 being the

-

minimum value of k. Since the "column exchanges" depend on the input matrix, the runtime, execution flow and therefore complexity will vary.

Figure 3 illustrates how this execution flow will result in execution of 12 WHILE loops, with 6 column exchanges (after each of which k is reduced thereby prolonging the process). This illustrated example is for a 4x4 complex random matrix H and QR decomposition H QR. In this example, 12 WHILE loops of the LLL are executed with 6 column exchanges, after which k is decreased. The variable execution flow is the main reason for difficulties in implementation.

Accordingly, execution of this algorithm can present difficulties as the computational complexity depends on the input matrix. Usually, the algorithm is implemented in such a program executed on a computer, that runs the algorithm in a WHILE loop until the algorithm terminates. Since the number of loops to be run by the algorithm is not known beforehand, the runtime is variable. If a column exchange has been made, the counter k is decreased, which increases the runtime since the algorithm is then further away from completion.

An aspect of the invention provides a lattice reduction assisted detector comprising at least one operational unit, said operational unit being operable to apply a length reduction and/or column exchange operation on input data presented as a matrix, and a controller operable to manage use of said at least one operational unit in a looping pipeline.

The present aspect of the invention may further provide a plurality of such operational units. In such a case, said controller may be operable to arrange said operational units into a pipeline of length appropriate to the application required.

The present aspect of the invention may further provide a further operational unit, operable to apply to input data presented as a matrix, a length reduction operation, and wherein the controller is operable to employ the further operational unit following the looping pipeline. 4.--

The present aspect of the invention may further provide a pre-processing unit operable to apply to input data presented as a matrix a pre-processing operation comprising a column sorting of the input matrix, wherein the controller is operable to employ the pre-processing unit prior to presentation of data to said looping pipeline.

The controller may be operable to run said looping pipeline through a pre-determined number of loops. Alternatively, said controller may be operable to run said looping pipeline until a predetermined condition is met.

Another aspect of the invention provides a communications apparatus incorporating receiving means including the lattice reduction aided detector of the above aspect of the invention. The communications apparatus may be any of an access point, a mobile terminal, or a base station.

A further aspect of the invention provides a lattice reduction aided detection process, comprising applying to input data presented as a matrix a number of processing steps, each processing step comprising either a length reduction or a column exchange on said matrix, and repeating said applying step as necessary to obtain a detected signal in a reduced lattice.

Preferably, the number of processing steps in said application step is determined on the basis of the constellation employed in transmission of the received input data.

A further aspect of the invention provides a lattice reduction aided detection process, comprising the steps of providing a plurality of processing units, each being capable of applying a length reduction or a column exchange to an input signal presented as a matrix, assembling from at least one of said processing units a processing pipeline with an input and an output, applying to said processing pipeline input an input signal presented as a matrix and, as necessary, re-presenting to said input a signal output at the output of the processing pipeline, for lattice reduction of the input signal.

The step of re-presenting may be executed a pre-determined number of times.

Alternatively, the step of re-presenting may be executed on condition that lattice reduction remains incomplete. The step of re-presenting can be omitted if not required by reference to the completeness of the lattice reduction. The completeness of lattice reduction may be determined by a completeness criterion.

Another aspect of the invention comprises a method of establishing wireless communication, including receiving a signal, and applying to said signal a lattice reduction aided detection process as set out in one of the above aspects of the invention.

Further aspects and advantages of the invention will be identified by the skilled person on reading the following description of specific embodiments of the invention. In particular, the reader will understand that the invention can be implemented as hardware, software, or a combination of both. In one embodiment, an aspect of the invention is implemented as hardware, with defined processing capabilities allocated to the prior recited plurality of processing units. Control of these can be implemented thereafter by software means executed on a general purpose computer. Such software can be supplied by means of a program carried on a general purpose carrier means, such as a computer readable storage medium, a signal, or a more hardware specific arrangement such as a ROM of any type.

The software product may be sufficient to implement the controller as set out above, or may instead be arranged to implement the whole of an aspect of the invention on a general purpose computer means.

The invention will now be explained further by the following description of a specific embodiment thereof, accompanied by the following drawings, in which: Figure 1 is a diagram of a constellation used in the prior art example set out above and with the specific embodiments of the invention; Figure 2 is a diagram showing implementation of a conventional lattice reduction process in accordance with the LLL algorithm; Figure 3 is a diagram showing execution flow of the process illustrated in figure 2; Figure 4 is a schematic diagram of a communications network including a receiver in accordance with a specific embodiment of the invention; Figure 5 is a schematic diagram of the receiver illustrated in figure 4; Figure 6 is a schematic diagram of a lattice reduction aided detector in the receiver illustrated in figure 5; Figure 7 is a schematic diagram of a processing element of the lattice reduction aided detector illustrated in figure 6; Figure 8 is a diagram showing execution flow of a process implemented by the lattice reduction aided detector illustrated in figure 6; Figure 9 is a flow diagram illustrating the process whose execution flow is illustrated in figure 8; Figure 10 is a schematic diagram of a lattice reduction aided detector in accordance with an alternative embodiment of the invention; Figure 11 is a schematic diagram of a lattice reduction aided detector in accordance with a further alternative embodiment of the invention; Figure 12 is a schematic diagram of a lattice reduction aided detector in accordance with a further alternative embodiment of the invention; Figure 13 is a schematic diagram of a lattice reduction aided detector in accordance with a further alternative embodiment of the invention; Figure 14 is a schematic diagram of a lattice reduction aided detector in accordance with a further alternative embodiment of the invention; Figure 15 is a schematic diagram of a lattice reduction aided detector in accordance with a further alternative embodiment of the invention; Figure 16 is a schematic diagram of a lattice reduction aided detector in accordance with a further alternative embodiment of the invention; Figure 17 illustrates a graph of block error rates for trials of said various specific embodiments of the invention; Figure 18 illustrates a graph of cumulative distribution function for block error rates for trials of said various specific embodiments of the invention; and Figure 19 illustrates a graph of block error rates for further trials for said various specific embodiments of the invention.

One structural difference from prior art and conventional LLL algorithms that is intended to be implemented by at least some of the specific embodiments of the invention, is that the lattice reduction can be implemented by creating a pipeline that can be easily executed. One particular advantage of specific embodiments of the invention is that execution has a known, non-random complexity.

The present invention will now be described with reference to an implementation thereof for the equalization of a wireless communication system. Figure 4 illustrates such a system, comprising a MIMO data communications system 10 of generally known construction. New components, in accordance with a specific embodiment of the invention, will be evident from the following description.

The communications system 10 comprises a transmitter device 12 and a receiver device 14. It will be appreciated that in many circumstances, a wireless communications device will be provided with the facilities of a transmitter and a receiver in combination but, for this example, the devices have been illustrated as one way communications devices for reasons of simplicity.

The transmitter device 12 comprises a data source 16, which provides data (comprising information bits or symbols) to a channel encoder 18. The channel encoder 18 is followed by a channel interleaver 20 and, in the illustrated example, a space-time encoder 22. The space-time encoder 22 encodes an incoming symbol or symbols as a plurality of code symbols for simultaneous transmission from a transmitter antenna array 24 comprising a plurality of transmit antennas 25. In this illustrated example, three transmit antennas 25 are provided, though practical implementations may include more, or less antennas depending on the application.

The encoded transmitted signals propagate through a MIMO channel 28 defined between the transmit antenna array 24 and a corresponding receive antenna array 26 of the receiver device 14. The receive antenna array 26 comprises a plurality of receive antennas 27 which provide a plurality of inputs to a lattice-reduction-aided decoder 30 of the receiver device 14. In this specific embodiment, the receive antenna array 26 comprises three receive antennas 27.

The lattice-reduction-aided decoder 30 has the task of removing the effect of the MIMO channel 28. The output of the lattice-reduction-aided decoder 30 comprises a plurality of signal streams, one for each transmit antenna 25, each carrying so-called soft or likelihood data on the probability of a transmitted bit having a particular value. This data is provided to a channel de-interle aver 32 which reverses the effect of the channel interleaver 20, and the de-interleaved bits output by this channel de-interleaver 32 are then presented to a channel decoder 34, in this example a Viterbi decoder, which decodes the convolutional code. The output of channel decoder 34 is provided to a data sink 36, for further processing of the data in any desired manner.

The specific function of the lattice-reduction-aided decoder 30 will be described in due course.

Figure 5 illustrates hardware operably configured (by means of software or application specific hardware components) as the receiver device 16. The receiver device 16 comprises a processor 110 operable to execute machine code instructions stored in a working memory 112 and/or retrievable from a mass storage device 116. By means of a general purpose bus 114, user operable input devices 118 are capable of communication with the processor 110. The user operable input devices 118 comprise, in this example, a keyboard and a mouse though it will be appreciated that any other input devices could also or alternatively be provided, such as another type of pointing device, a writing tablet, speech recognition means, or any other means by which a user input action can be interpreted and converted into data signals.

Audio/video output hardware devices 120 are further connected to the general purpose bus 114, for the output of information to a user. Audio/video output hardware devices can include a visual display unit, a speaker or any other device capable of presenting information to a user.

Communications hardware devices 122, connected to the general purpose bus 114, are connected to the antenna 26. In the illustrated embodiment in Figure 4, the working memory 112 stores user applications 130 which, when executed by the processor 110, cause the establishment of a user interface to enable communication of data to and from a user. The applications in this embodiment establish general purpose or specific computer implemented utilities that might habitually be used by a user.

Communications facilities 132 in accordance with the specific embodiment are also stored in the working memory 112, for establishing a communications protocol to enable data generated in the execution of one of the applications 130 to be processed and then passed to the communications hardware devices 122 for transmission and communication with another communications device. It will be understood that the software defining the applications 130 and the communications facilities 132 may be partly stored in the working memory 112 and the mass storage device 116, for convenience. A memory manager could optionally be provided to enable this to be managed effectively, to take account of the possible different speeds of access to data stored in the working memory 112 and the mass storage device 116.

On execution by the processor 110 of processor executable instructions corresponding with the communications facilities 132, the processor 110 is operable to establish communication with another device in accordance with a recognised communications protocol.

Figure 6 illustrates the lattice reduction aided decoder 30 operable to process input information in accordance with a lattice reduction process, underlying which process is a processing algorithm to be described in due course. The lattice reduction aided decoder comprises a pipeline of elements 40 of equivalent function, differing only in their processing parameters. These functional elements 40 are arranged to perform a stage in the lattice reduction process.

Each element 40 is in the form of a two stage processing element as illustrated in figure 7. Such a stage applies a length reduction 42 and column exchange 44 action on a matrix of input values, hence the legends LR and CE in figure 7. The column exchange action 44 is enhanced by the output, in addition to the processed information output, of a CE flag, which is output if the element has actually applied a column exchange operation to the input data.

The CE flags from the plurality of pipeline elements 40 are together passed to an indicator unit 46 which identifies when all of the elements have processed the input information successively without performance of a column exchange operation. This indicates that the lattice reduction process is complete, and thus stops the looping.

With the indicator unit, it is possible to use the output thereof as a termination criterion when the output is Lenstra-LenstraLovasz (LLL) reduced. Furthermore the status of the output (LLL reduced or not) could be used as additional information when using the results. It should be noted that the complexity of the implemented algorithm is not constant and the runtime is not guaranteed to have a maximum.

The main idea in the present embodiment of the invention is to always increase the counter k and hence progress in the matrix from the left (k-2) to the right hand side (k=m). This idea may be extended to going the other direction, i.e., from k=m to k=2 by always decreasing the counter. The original algorithm may increase (line 18) or decrease (line 16) the counter k and is finished once k=m. The proposed algorithm

I

always increases the counter, so it can be expected that the results is not as good when k=m is reached. Hence once the end of the matrix has been reached, the procedure can be repeated.

By doing this a pre-determined number of times, the complexity and runtime is constant which makes it suitable for hardware implementation.

It will be appreciated also that in reference to a counter increasing, the counter could equally be designed to decrement. In such a case, the respective impacts of length reduction arid column exchange are exchanged. Moreover, instead of or additional to the repetition of the procedure, the process could be applied in the reverse direction with the aim of improving on the result of the preceding application of the process. It will further be appreciated that, whereas in some circumstances, the process can result in a reduction in the quality of the outcome, either in terms of accuracy or speed, the predictability of the time expenditure taken to apply lattice reduction can be enhanced in the described embodiment.

The process described above, with reference to the apparatus illustrated in figure 6, will now be described in terms of the underlying mathematical algorithm applied to the input data. One fundamental difference from the prior art example given in the introduction, is that, in programming and process related terms, the WHILE-loop is now replaced by a FOR-loop since the counter k is always increased. It should be noted that this FOR-loop (lines 2-16 below) may be repeated several times to improve performance.

INPUT: Q,R,P (default P =m) OUTPUT: Q,R,T (1) Initialisation: Q = Q,R = R,T = P (2) for k = 1: m (3) for l-k-1,.*.,1 (4) u = (R(l,k)/ (i,i)) (5) ifu!=O (6) (7) T(;,k)=T(:,k)-1uT(:J) (8) end (9) end (10) if5.i(k-l,k-l)2>i(k,k)2+R(k-1,k)2 (11) swap columns k-i and kin R and T (12) calculate Givens rotation matrix such that element R(k, k -i) becomes zero: a-(k-I,k-I) (a b"\ !l(k_1:k,k_lI l,-b aJ b= R(k,k-l) I(k-1:k,k_1I (13) R(k-l: k,k-l: m)=@k(k-I:k,k-l: m) (14) Q(:,k_l:k)Q(:,k_l:kT (15) end (16) end (17) Since the complexity is independent from the number of column exchanges, 5 is set to equal 1 in the "CE" process, to allow as many column exchanges as possible. The choice 0.75 in the original LLL algorithm described above is a trade-off between output quality and complexity. This is because of the increased probability that the inequality oR(k -1,k 1)2 > (k, k)2 + (k -1, k)2 holds.

Figure 8 is similar to figure 2, in that it shows execution flow of the algorithm implemented by the specific embodiment of the invention. Again, the example is based on an input matrix R of the equation H = QR wherein m =4 columns, but again any other number m = 2 is possible for real and complex matrices. The counter k no longer depends on the execution of column exchanges. The apparatus is implemented substantially as apipeline, with consequent operational advantages. For every "LR/CE" process in the pipeline structure the value of the counter k is known in advance and the implementation of these structures is easier and of lower complexity as in the original LLL algorithm.

This is supported by the implementation illustrated in figure 9 of the lattice reduction aided decoder 30 illustrated in figure 6. The figure 9 implementation includes three processing elements 40, for k2:4.

Lines 2 and 9 in the above process correspond with the arrow 50 in figure 9 as they form a loop to repeat the pipeline y -1 times. This can be implemented either with y pipeline structures after each other or with a loop repeating one pipeline, or combinations of both possibilities. This makes the present process particulaily suitable to hardware implementation, as plural hardware implementations of the pipeline element 40 can be provided in hardware and, if more are required, these can be constructed into loops. Furthermore, using this arrangement, the usual advantages of pipelining, such as the throughput of multiple tasks, can be achieved.

It should also be noted that, in addition to the presented calculations concerning Q, R and I in both "length reduction" and "column exchange", it would also be possible to include a technique previously described in UK Patent Application 0614088.3, wherein the inverse of T and/or the product T I (row-sum) can be computed at the same time as lattice reduction. Such an approach involves, in general terms, applying the same operations to a calculation matrix as to the input matrix of values, wherein the calculation matrix is initially set to an identity matrix of appropriate size.

Comparing figure 9 with figure 2, it will be appreciated by the skilled reader that an advantage delivered by the illustrated specific embodiment of the invention, is that it provides a lattice reduction in an implementation that is of lower complexity and fixed runtime, and is easier to implement in a hardware architecture.

Simulation of the complexity in the sense of counted loops (i.e. "LRICE" processes) will be described and illustrated in due course.

In accordance with a second specific embodiment of the invention, figure 10 illustrates an additional processing unit which can be inserted after the final pipeline element 40 to further approximate the implemented process to the LLL algorithm.

After every column exchange, the original LLL algorithm performs a "length reduction" process to reduce the size of the basis vector thereafter. The fixed structure of the first embodiment allows column exchanges without the need for "length reduction" processes to follow. In the case that, in the last execution of the pipeline, a column exchange was performed, it may be desirable to improve data quality by defining an optional closing procedure to be run after the last pipeline element. The general structure is shown in figure 10.

In this embodiment, a pipeline 60 of LR elements 62 is applied to the output of the lattice reduction aided decoder 30 of the first embodiment. Each LR element 62 differs from the pipeline elements 40 of the decoder in that it only applies a length reduction operation.

In the simulation results we see significantly better results with this element attached to the pipeline.

In respect of the original LLL algorithm, it is known that providing a sorted input matrix will reduce the average number of required loops. This could be done by sorting the columns of the matrix H according to their norm (such as described in C. Windpassinger, R. F. H. Fischer and J. B. Huber, "Lattice reduction aided pre-coding", IEEE Transactions on Communications, Dec. 2004 Vol. 52, Issue 12, pp 2057-2060), beginning with the smallest norm in the left column. This leads to fewer column exchanges and therefore to fewer while loops. The "sorted Gram-Schmidt QR decomposition", described in Wubben et a!. referenced above, is another pre-processing procedure which may be appropriate, and which is known to be more computationally complex but more efficient than column sorting.

To enhance the performance of the new technique it will be understood that the same pre-processing can be applied to the input arguments before starting the lattice reduction algorithm in accordance with the described embodiments.

As set out in the preceding disclosure, all or at least one of the specific embodiments of the invention provide certain operational differences from an embodiment implementing a LLL lattice reduction algorithm in accordance with the prior art. In particular, a guaranteed constant runtime can be achieved. Also, the embodiments enable a fixed (i.e. hardware) structure to be used to compute the "length reduction" processes since "k" is defined in advance. Otherwise, a more flexible approach would be required, possibly involving software and a more sophisticated processing capability, as the exact nature of the processing block depends on the current value of the parameter "k".

Moreover, using the pipelining structure, several lattice reductions can be processed at once, with different lattice reductions at different stages of the pipeline. Further, an approximation of the standard LLL algorithm can be achieved with fixed (hardware) structure elements.

In terms of the above examples of specific embodiments of the invention, simulations have been performed to determine how the implementations of the reduced complexity LLL approximation algorithm set out above can compare in terms of performance with the ideal (but computationally complex) case. To this end, a 4x4 antennas MDAO OFDM system was modelled, with 64 QAM modulation, 5/6 rate convolutional code and Reduced Lattice MMSE detection. IEEE 802.1 in channel models B and E are simulated as set out in "TGn channel models" (V. Erceg et al, IEEE 802.11 -03/940r4, May 2004, http://www.802wirelessworId.com).

The following structures were used in the simulations: -the original complex LLL algorithm for S = 3/4 and 8 = 1 -the LLL implementation within a pipeline as shown in figure 6 for 8 =3 / 4 and 5=1 -several types of fixed complexity lattice reduction pipelines (as illustrated in figures 11 to 16) namely: a. 4(m-1) pipeline (figure 11) b. 4(m-1) pipeline with additional length reduction afterwards (figure 12) c. 4(m-1)-2 pipeline with reverse pipeline element (figure 13) d. 4(m-1)-2 pipeline with reverse pipeline element plus additional length reduction (figure 14) e. 5(m-1) pipeline (figuze 15) f. 5(m-1)-2 pipeline with reverse pipeline element (figure 16) The notation "4(m-1)" refers to 4 repeated for-loops, each involving m-13 LRJCE operations. Table 1 sets out the total number of fixed structure LRJCE processes (plus any required additional LR processes) in comparison with the WHILE loops that would be required in the LLL algorithm.

Table I

a b c d e f j12 12+ 10 10+ 13 15

L LR LR

Firstly a comparison is made with the Block Error Rate of the simulated iaUice reduction techniques. The results of this analysis are set out in figure 17.

First of all, it can be observed that the simulated original LLL algorithm, the LLL implementation with pipeline structures and pipeline (e) are of the same quality, from this, it can be concluded that the fixed complexity pipeline structure of the specific embodiment is operable at an acceptable Block Error Rate as it achieves the same quality as the normal LLL algorithm.

Pipelines (a) and (c) offer the worst results in this test. This comes from the limitation to 12 or 10 "LRJCE" processes. In figure 18 it can be seen that the median of the original LLL algorithm is approximately 11 loops (5 = 3 / 4).

Pipelines (b) and (d) are essentially the same as (a) and (c) with the addition of a length reduction process applied thereto. Both of these results achieve very close (0.25 dB) to the performance of the original LLL Block Error Rate curve. This shows that the addition of this length reduction process to the fixed structure pipeline at the end of the pipeline can improve performance in the case that the basic implementation does not achieve desirable performance.

In figure 18 the complexity of the structures presented above is shown. The number of loops executed in completion of the lattice reduction is a measure of complexity, as twice the number of loops means that the algorithm in question will require twice the execution time.

This measure corresponds across to the introduced "LR/CE" structure described above, in that the calculations are fundamentally the same, although the LR/CE pipeline structure 40 should be faster in calculation time since the complexity (for instance the parameter k) is known in advance for each calculation. The Cumulative Distribution Function (CDF) of the number of "loops" is illustrated in figure 18.

It can be observed that the pipelines (a), (c), (e) and (1) (figures 11, 13, 15 and 16) have a constant number of ioops (as implemented). Since the additional calculation of the "length reduction" LR is not identified in this analysis, the real calculation time for pipelines (b) and (d) (figures 12 and 14) will be slightly higher.

One significant observation is that the LLL implementation as a pipeline structure outperforms the original LLL algorithm in both cases 5 = 3/4 and 5 1. This shows that it is possible to limit the LLL pipeline implementation to, say, 6(m-1) loops to reach a number of completed (i.e. Lenstra-Lenstra-Lovasz-reduced) output matrices, 98% of the time (example not shown in figure 18), while after 5(m-1) loops (broken line with square points) the LLL-reduced part of the output is approximately 80%; the latter is otherwise an appropriate comparison as it has the same Block Error Curve as the LLL algorithm in figure 17.

Figure 19 illustrates further comparative examples of Block Error Rate curves for simulated implementations of specific embodiments of the invention. Details of the specific examples are set out in the legend of the graph in Figure 19. From these graphs, it is evident that applying a column norm sorted pre-processing step to the H matrix before QR decomposition (I-I=QR), has an impact on performance. For instance, pipeline (a) with and without the pre-sorting step is illustrated -with the pre-sorting step, there is a clear gain in performance (reduced Block Error Rate) over the performance of pipeline (a) alone, that there is a gain when pre-sorting of H is applied.

It will be appreciated that lattice reduction has many possible applications, for example in precoding and in MIMO decoding in wireless communications. The examples and simulations provided in the specific embodiments described above are specifically for a MIMO communication system, though the skilled reader will appreciate that lattice decoding can be used in various applications.

Claims

CLAIMS: 1. A lattice reduction assisted detector comprising at least

one operational unit, said operational unit being operable to apply a length reduction and/or column exchange operation on input data presented as a matrix, and a controller operable to manage use of said at least one operational unit in a looping pipeline.
2. A detector in accordance with claim 1 and including a plurality of said operational units.
3. A detector in accordance with claim 2 wherein said controller is operable to arrange said operational units into a pipeline of length appropriate to the application required.
4. A detector in accordance with any one of the preceding claims and including a further operational unit, operable to apply to input data presented as a matrix, a length reduction operation, and wherein the controller is operable to employ the further operational unit following the looping pipeline.
5. A detector in accordance with any one of the preceding claims and including a pre-processing unit operable to apply to input data presented as a matrix a pre-processing operation comprising a column sorting of the input matrix, wherein the controller is operable to employ the pre-processing unit prior to presentation of data to said looping pipeline.
6. A detector in accordance with any one of the preceding claims wherein the controller is operable to run said looping pipeline through a pre-determined number of loops.
7. A detector in accordance with any one claims 1 to 6 wherein said controller is operable to run said looping pipeline until a predetermined condition is met.
8. A communications apparatus incorporating receiving means including the lattice reduction aided detector in accordance with any one of the preceding claims.
9. A lattice reduction aided detection process, comprising applying to input data presented as a matrix a number of processing steps, each processing step comprising either a length reduction or a column exchange on said matrix, and repeating said applying step as necessary to obtain a detected signal in a reduced lattice.
10. A lattice reduction aided detection process in accordance with claim 9 wherein the number of processing steps in said application step is determined on the basis of the constellation employed in transmission of the received input data.
11. A lattice reduction aided detection process, comprising the steps of providing a plurality of processing units, each being capable of applying a length reduction or a column exchange to an input signal presented as a matrix, assembling from at least one of said processing units a processing pipeline with an input and an output, applying to said processing pipeline input an input signal presented as a matrix and, as necessary, re-presenting to said input a signal output at the output of the processing pipeline, for lattice reduction of the input signal.
12. A lattice reduction aided detection process in accordance with claim II wherein the step of re-presenting is executed a pre-determined number of times.
13. A lattice reduction aided detection process in accordance with claim 11 wherein the step of re-presenting is executed on condition that lattice reduction remains incomplete, including not executing the step of re-presenting if lattice reduction is complete on the first performance of said applying step.
14. A lattice reduction aided detection process in accordance with claim 12 and further including comparing execution of the applying step against a completeness criterion to determine the completeness of the lattice reduction.
15. A method of establishing wireless communication, including receiving a signal, and applying to said signal a lattice reduction aided detection process in accordance with any one of claims 9 to 14.
16. A computer program product comprising computer executable instructions which, in use, will cause a general purpose computer apparatus to become configured as a detector in accordance with any one of claims I to 8.
17. A computer program product comprising computer executable instructions which, in use, will cause a computer comprising at least one operational unit, said operational unit being operable to apply a length reduction and/or column exchange operation on input data presented as a matrix, to implement further a controller operable to manage use of said at least one operational unit in a looping pipeline.
18. A computer program storage medium storing a computer program product in accordance with claim 16 or claim 17.
19. A computer receivable signal bearing a computer program product in accordance with claim 16 or claim 17.