GB2456319A

GB2456319A - Data processing apparatus for adjusting the inner product of first and second vectors using orthogonal transformations.

Info

Publication number: GB2456319A
Application number: GB0800428A
Authority: GB
Inventors: David Milford
Original assignee: Toshiba Research Europe Ltd
Current assignee: Toshiba Europe Ltd
Priority date: 2008-01-10
Filing date: 2008-01-10
Publication date: 2009-07-15
Anticipated expiration: 2028-01-10
Also published as: GB0800428D0; GB2456319B

Abstract

The invention relates to a data processing apparatus which applies orthogonal transformations to first and second vectors based on the value of their inner product. The apparatus comprises an array of processing elements which iteratively apply plane rotations to pairs of corresponding components of the first and second vectors, and a control element which calculates their inner product and provides a control signal to the array based upon it. The control signal is used by the processor array to indicate the sense of each of a succession of diminishing rotations to be performed on the vectors. Further described is a cross coupled CORDIC processing element which operates on signals representing complex data in successive real and imaginary rotation operations.

Description

1 2456319 A Data Processing Device And Method This invention relates to data processing apparatus for processing data presented in the form of vectors. In particular, it relates to applying orthogonal transformations to vector data to adjust the inner products of pairs of vectors.

Applying transformations to adjust the inner products of pairs of vectors finds use in many technical areas. One use is in the performance of orthogonal transformations by which pairs of vectors may be rendered mutually orthogonal. These transformations can represent steps in performing singular value decompositions (SVDs) or eigenvalue decompositions (EVDs), which are useful in a wide variety of practical applications.

As is known in the art, orthogonalisation of pairs of vectors iteratively selected from a matrix may render each component vector of that matrix orthogonal to every other component vector. This approach, referred to here as Hestenes' method, is discussed in the paper by M.R. Hestenes, entitled "Inversion of matrices by orthogonalisation and related results", J. SIAM, Vol 6, 1958, pp 5 1-90.

The applicant has observed that hardware that facilitates reduced complexity orthogonalisation of vectors would facilitate implementations of SVDs and EVDs and would be useful in a wide variety of areas. One practical area which would particularly benefit from such is multiple antenna transmissions, such as Multiple Input Multiple Output (MIMO) transmissions. Decoding of such transmissions is facilitated by use of SVD and EVD computations of relatively small matrices under relatively demanding hardware constraints.

An SVD of a matrix A can be defined as: A=UVH, where U and V are unitajy matrices, arid E = diag (at, O2 a3, ...) where a1? a2_ a3 are singular (real) values. The ratio of the largest diagonal value to the smallest diagonal value provides information on the condition of the CSI matrix.

The columns of U and V are known respectively as the left and right singular vectors of A. If A is square, the matrix of eigenvectors E = [e1,e2,e3...] and associated eigenvalues A = diag(21 2 23...) is defined by AE=EA from which follows the eigenvalue decomposition A=EA E' If A is Hermitian, it can be shown that the matrix of eigenvectors E is unitary and the EVD can be written: VA V' This will be understood by the skilled addressee as a special case of an SVD in which the left and the right singular vectors are identical.

It follows from the above definition of the SVD that UHAV=2 and therefore that the matrix A can be reduced to diagonal form by operating from the left and right hand sides with unitary (plane rotation) matrices. Most two-sided Jacobi methods use an iterative procedure in which left and right rotation operators are computed so that, starting with matrix A, zeros are generated, two at a time, in off-diagonal matrix locations. Iterative application of the left and right operators to separate identity matrices results in matrices containing the left and right singular vectors.

If only one of the matrices of singular vectors (say V) is required it is more efficient to use the correlation matrix: AHA = VUHUVH = The product A11A is a Hermitian matrix whose eigenvectors are the right singular vectors of A and whose eigenvalues are the square of the singular values.

The symmetric eigenvalue decomposition provides the basis for computations of V in which the matrix AHA is iteratively reduced to diagonal form by a sequence of two-sided unitary transformations; the matrices in this case are conjugates. Starting with an identity matrix, the unitary operator is applied at each stage of the iteration to obtain increasingly good approximations of the matrix V. It also follows from the definition of the SVD that: AV=U The right-hand side of the equation is a set of mutually orthogonal column vectors with norms equal to the singular values while the left-hand side shows that this set can be obtained by an orthogonal transformation of A. An iterative procedure, in which pairs of columns in A are transformed by a 2x2 unitary operator until all the columns in the transformed matrix are mutually orthogonal, will achieve this result. This procedure is described in Hestenes cited above.

While the above analysis considers the case where matrix operations are carried out from the right, and thus applied on a column basis, the equivalent row based operations from the left will be evident to the reader.

Uses for SVDs and EVDs in MIMO systems include diagnosis of the condition' of the CSI matrix and control of the system to improve the condition of the CS! matrix.

Through these uses, SVDs and EVDs have become important in operating effective MIMO systems.

A MIMO transmission may be characterised by a Channel State Information (CSI) matrix. A M[MO CSI matrix will typically have dimensions in the order of the number of transmission antennas and reception antennas. In an exemplary MIMO system, wherein a transmitter and a receiver each have 4 antenna, the CSI matrix will be a 4x4 matrix of complex elements. Naturally, the complex nature of the elements of the matrix increases the demands on any decoder or any element thereof.

An example of an application of SVDs is the use of beam forming in MIMO transmissions. A conventional communications system can be represented mathematically as: y = Hx+ v in which y is an n-by-i vector representing the received signal, H is an n-by-rn channel matrix modelling the transmission characteristics of the communications channel, x is an rn-by-I vector representing transmit symbols, v is an n-by-I noise vector and wherein rn and n denote the number of transmit and receive antennas respectively.

If the transmitted symbol vector x is modified by operating upon it from the left with a unitary matrix V, then the transmission (ignoring noise) becomes y=HVx = UVHVX = UEx V can be termed a pre-coding matrix. The transmitted symbol can therefore be recovered easily from the received vector by multiplying from the left by UH and using the singular values. Discovery of a particularly small singular value might prompt a decision to cease transmission on the corresponding antenna. Alternatively, transmission might be confmed to fewer streams than there are antennas by inspection of singular vectors.

Diagnostics generally focus on singular values, resulting from a singular value decomposition of a CSI matrix.

The Jacobi method of orthogonalising vectors involves applying progressive plane rotations. An orthogonalisation transformation, such as B = Q.A (where Q.QT = QT*Q = I) can be implemented as a sequence of plane rotations: Q = Qi.Q.Q QN known as Givens or Jacobi rotations, where Q is based on an identity matrix in which four of the elements are replaced as follows: cosB stn9 -sinG cos9 The selection of the four elements to be used determines in which plane (in multi-dimensional space) the vectors are rotated.

A Jacobi method may be two-sided or one-sided. In a two-sided Jacobi method, operations are performed from both the left and right sides of the object matrix with the aim or rendering off-diagonal elements of the object matrix zero or approximately zero.

In a one-sided Jacobi method, orthogonal transformations are applied either from the left or right side (but not both) with the aim of rendering, respectively, rows or columns of the object matrix mutually orthogonal or approximately mutually orthogonal.

A Jacobi transformation may be represented as transformations applied to corresponding coordinate pairs of the two vectors. The transformation [x'] -cos9 sin 9Jx Ly'i -sinG cos °iLy on a 2-vector can be represented diagrammatically as a plane rotation which leaves the vector magnitude unchanged. This is illustrated in Figure 1.

This transformation may be represented, in relation to a pair of real row vectors, as: [ x x1 _F cosG sin 11-x2 i -L-sinG cos ej[y, y2 Under this transformation, the magnitude of each (Xj,yj) column vector formed from corresponding elements of the row vectors will be preserved. This is illustrated in Figure 2. Clearly, the magnitudes of the row vectors (xi.... x) and (Yi y) are not preserved by the transformation.

Any rotation may be defined by a sense' of rotation and a degree' or magnitude' of rotation. The sign of the off-diagonal (-sinG and sinG) terms in the transformation above determine the sense' of rotation applied and 9 determines the degree or magnitude of rotation. By this transformation, each vector (x,, y) experiences plane rotation through an angle 9.

By appropriate choice of 9, an element of a matrix of constituent vectors may be set to zero: x1' x2' x3' X' _[ cosG sin oTxi x2 x3 x o y2' y3' .. y' [_sinG cosej[yi)2 YP? In addition to this, the applicant has observed that, as the accumulated rotation traverses through a full circle (i.e. 3600 or 2m radians), the inner product of the two row vectors (x and y) varies sinusoidally through two complete cycles. Thus, through a complete revolution, the inner product passes through zero four times -the inner product has four zero crossing points.

It can also be shown that a transformation represented by 9 x3' cos9 sinol[xi x2 x3 y1' y' y' y'j [-sin9 cosej[y1 y2 y3 can orthogonalise the rows of a matrix, i.e. render the inner product zero. This is: x'T.y'=O if tan26=2[ XTY] IxI -? One conventional approach to performing SVDs on matrices makes use of the QR' algorithm. This algorithm is an efficient algorithm generally associated with non-real time applications. The QR algorithm is not based on Jacobi methods. It may not be weilsuited to some applications, such as MIMO and other transmission systems, which often rely on parallelisation of relatively low complexity processors. Jacobi based processes may enable faster implementations on arrays of microprocessors, for example.

This is suggested in "Computation of the singular value decomposition using mesh-connected processors" (B.P. Brent, F.T. Luk and C.F. Van Loan, Journal of VLSI and Computer Systems, Vol. 1, No 3, 1985 pp. 242-270). "Jacobi's method is more accurate than QR" (J. Demmel and K. Veselic; SIAM J. Scientific and Statistical Computing, Vol. 11, 1992 pp 1204-1246) suggests that Jacobi methods are more accurate than the QR algorithm.

The applicant has observed that useful application of S\'D or EVD processes or, more generally, Jacobi processes, to MIIMO and other communication fields will be facilitated by efficient hardware means for performing these processes. For example, MIMO transmission is used to interconnect portable devices. These typically have tight constraints on power consumption and heat dissipation. To apply MIMO diagnostics and control processes that involve SVDs or EVDs to interconnect those devices, it is helpful to provide low complexity and low power consuming processors. The applicant has observed that this may be possible if SVD or EVD processors can be implemented with hardware which involves minimal explicit use of multiplier units.

The applicant has also observed that these processes need only be applicable to relatively small matrices to find useful application in MIMO and other fields.

As discussed above, orthogonalisation of a pair of vectors can be performed by iteratively applying appropriate Jacobi transformations. Orthogonalising vectors using Jacobi transformations involves applying rotations through a degree of rotation 0 selected to cause the pair of vectors affected by the Jacobi matrix to be rendered orthogonal with each other. This follows from the observation that any pair of vectors may be rendered mutually orthogonal by a suitable rotation, defined by a degree of rotation 8 and a sense of rotation.

It follows naturally that the conventional approach to orthogonalising vectors with Jacobi methods, to perform SVDs or EVDs for MIMO and other transmission applications, involves a two step process. In the first step, a suitable degree of rotation o is pre-calculated. In the second step, plane rotations are applied to corresponding components of a pair of vectors relative to each other through the pre-calculated degree of rotation 0. The appropriate rotation is computed; suitable rotations are then applied to corresponding entries of a pair of vectors. Evaluation may be done by a suitable matrix process in which an entry of a vector is zeroed, or made orthogonal to the base of the matrix. This evaluation step typically involves trigonometric calculations. These calculations are typically carried out on a general purpose processor using an approximation sequence. This typically involves significant complexity, power consumption and heat dissipation in hardware for performing SVDs and EVDs.

Some previous implementations of Jacobi methods have used CORDIC processor arrays because they are of relatively low complexity and they enable efficient computation of parallel plane rotations. These follow the'conventional approach involving two steps in which the rotation angle is pre-computed (sometimes using a CORDIC array) and applied subsequently.

"The CORDIC trigonometric computing technique" (J.E. Voider, IRE Trans Electronic Computers, Vol. EC-8, No 3, pp 330-334, 1959) introduces the use of CORDIC computing techniques to reduce the complexity of practical implementations of trigonometric calculations. Moreover, "VLSI implementation of a CORDIC SVD processor" (J.R. Cavallaro et al, Proc. Eighth University/Governmentllndustry Microelectronics Symposium, 1989, pp 256-260) presents an application of CORDIC techniques to S\TD.

An attempt at reducing complexity of hardware implementations of SVDs and EVDs is described in W02006/053340. According to the approach described in that document, Jacobi matrices are computed from identity matrices. That document follows the conventional two step process involving pre-calculation of entries of a Jacobi matrix and then a rotation. That document does not disclose any specific means of computation of the Jacobi matrix. Therefore, the values computed from the current matrix will be understood by those skilled in the art to involve trigonometric calculations to determine 0. As mentioned above, these calculations are computationally intensive and will require a relatively powerful general purpose processor which will inevitably employ multiplier elements.

W02006/053341 describes the use of a CORDIC processor and a look-up table to assist the pre-computation of values of 9 appearing in the Jacobi matrix. Again, the conventional two-step approach is taken.

Another attempt to provide reduced complexity implementations of SVDs or EVDs is described in W02006/083 594. That document describes an array of processors with dimensions that match the structure of a matrix to be treated by Jacobi transformations.

The array is composed of two types of processing elements differentiated by the position in the array: diagonal or off-diagonal. This approach also relies on the conventional approach, involving pre-computation of degrees of rotation followed by plane rotations. Again, calculating the degree of rotation to be applied is, in itself, a relatively complex calculation involving trigonometric approximation sequence calculations and may require relatively complex hardware involving large numbers of multiplier elements.

A further attempt is disclosed in W02006/107700. This involves zeroing separately the real and imaginary parts of off-diagonal elements of the Hermitian inner product of a matrix AHA or AAH in a two sided Jacobi process. This process again involves a relatively high degree of complexity.

The applicant has observed that a number of fields, such as MIMO and other communication schemes, would benefit from a reduced complexity device for adjusting the inner product of vectors.

The applicant has also observed that a number of fields, such as multiple antenna transmission, would benefit from a device facilitating reduced complexity implementation of orthogonal transformations on vectors or matrices.

An aspect of this invention concerns applying orthogonal transformations to pairs of vectors to render them mutually orthogonal. Further, an aspect of this invention relates to performing decompositions of matrices using orthogonal transformations. Yet further, an aspect of this invention relates to diagnostics and/or control of multiple antenna transmission systems using decompositions involving orthogonal transformations.

An aspect of the invention provides a data processing apparatus for processing data defining first and second vectors, the apparatus being operable to adjust the inner product of the vectors, said apparatus comprising an array of processing elements operable to apply iteratively plane rotations to pairs of corresponding components of the first and second vectors; and a control element operable to calculate the inner product of said first and second vectors following processing by said array, and to provide to the array a control signal based on said inner product; wherein the array of processing elements is operable to receive the control signal and to apply successively smaller rotations with the sense of each rotation determined according to the control signal.

Another aspect of the invention provides a CORDIC processing element operable to apply rotational transformations to pairs of corresponding real and imaginary components of first and second vectors defined in complex space, wherein said CORDIC processing element comprises first, second, third and fourth registers each communicating substantially directly with a respective addition/subtraction element and each communicating indirectly with the respective addition/subtraction element of another register through a right shift element, the first and second addition/subtraction element and said third and said fourth addition/subtraction element being so arranged in respective cross coupled pairs, wherein each addition/subtraction element is operable to receive the control signal and either add or subtract according to the control signal and wherein said addition/subtraction elements in said pairs are configured such that when one is configured to add, the other is configured to subtract, and wherein outputs of said first and fourth addition/subtraction elements are configured to feed back respectively to said first and fourth registers and said second and third addition/subtraction elements are cross coupled back respectively to said third and second registers so that real and imaginary parts of rotated vector components can be calculated in said one device in successive phases of operation.

Another aspect of the invention provides data processing apparatus comprising a plurality of CORDIC processing elements in accordance with the preceding aspect of the invention, each being operable on corresponding elements of first and second vectors, and a control element operable to determine respective real and imaginary phase control signals, on the basis of an inner product of said vectors after said successive operations, for said CORDIC processing elements.

Matrix information processing apparatus can be provided which comprises first and second arrays each comprising a plurality of CORDIC processing elements as aforementioned, first and second matrix register means operable to store information defining a matrix to be processed by said first and second arrays, and row selection means operable to select two rows of a matrix for processing, and control means operable to generate control signals on the basis of operation of said first array, said control signals being operable to control both said first aid second arrays of CORDIC processing elements.

In another aspect the present invention provides a data processing apparatus for processing data, the data defining first and second vectors, the apparatus being operable to adjust the inner product the vectors, the apparatus comprising: at least one array of processing elements operable to iteratively apply plane rotations to corresponding elements of the first and second vectors, said plane rotations being applied according to a sense of rotation and a degree of rotation; and one or more control elements operable to calculate the sign of the inner product of said first and second vectors and provide a control signal for the at least one array of processing elements; wherein the at least one array of processing elements is operable to receive the control signal and apply rotations with a sense of rotation determined according to the control signal.

The control signal may indicate the sign of the inner product.

As understood by those skilled in the art, plane rotations applied to corresponding elements of two vectors will adjust the inner product of those vectors. These plane rotations are applied to 2-dimensional vectors formed from two corresponding elements of the first and second vectors.

The term sense of rotation', as used herein, is intended to have a standard mathematical definition. For the purpose of illustration only, the terms clockwise' and counter-clockwise' have opposite sense'. In the context of a plane rotation represented by a Jacobi matrix, the sense of rotation will be defined by the sign of appropriate elements of the matrix, as known to those skilled in the art.

As used herein, the term control signal' is intended to include any means by which a control element may communicate with, or control, a processing element. This could include a register value, a software or firmware variable, the state of a gate, or a voltage.

The array of processing elements may include a processing element provided for each corresponding element of the first and second vectors.

The at least one array may be operable to receive corresponding elements of the first and second vectors.

The at least one array may be operable to apply degrees of rotation selected from a set of possible values.

The set of possible values may comprise values each of which is defined as arctan(20), where n is a non-negative integer. The set may be ordered, commencing at n0.

The array of processing elements may be operable to cease iterations after a predetermined number of degrees of rotation from said sequence has been applied. This may be represented by a given number of elements in the set of possible values of the rotation. This may be represented as a maximum value of n. The applicant has observed that in some fields, vectors may only need to be rendered approximately orthogonal. This invention allows precision of orthogonalising to be traded for processing speed or reduced complexity where this is desirable. This trade-off may be predetermined or adjusted dynamically.

The applicant has observed that if the sense of rotation changes appropriately with iterative plane rotations defined by the sequence arctan(2), then any pair of vectors to which the relative plane rotations are being applied will converge towards orthogonality, or the ilmer product will to converge to zero. Therefore, aspects of this invention provide a device for adjusting the inner product of a pair of vectors towards zero, or adjusting the vectors towards orthogonality, without pre-calculating the degree of rotation 9 In a sense, trigonometric pre-calculations are replaced with the computation of an inner product. This low complexity computation can be implemented efficiently with a dedicated control element comprising a simple hardware configuration. This device may, therefore, facilitate a reduced complexity process for orthogonalising pairs of vectors. The device may, therefore, eliminate the explicit need for processors capable of determining degrees of rotationby trigonometric calculation.

Preferably, a processing element is operable to receive two values as inputs and provide two values as outputs.

A processing element may include a pair of registers each communicating substantially directly with respective addition/subtraction elements and each communicating indirectly with the respective addition subtraction element of the other register through a right-shift element, wherein each subtraction element is operable to receive the control signal and either add or subtract according to the control signal. This may provide an apparatus for implementing plane rotations with controllable sense of rotation.

Preferably, said at least one array includes at least one pair of cross-coupled processing elements interconnected such that one of the outputs of each processing element provides an input for the other processing element.

Preferably, said at least one pair of cross-coupled processing elements are operable to operate in first and second interleaved time phases.

Preferably, said two or more cross-coupled processing elements are operable to apply plane rotations to the complex vectors such that in the said first time phase the real part of the inner product varies and the imaginary part remains substantially constant and in the said second time phase the imaginary part varies and the real part of the inner product remains substantially constant.

Plane rotations which vary the real part of the inner product and allow the imaginary part to remain constant may be defined by: cos9 sin 9jx1 x2 x3 *** x L-sinG cos ej[y, y2 y3 in reference to operation on a 2 x n matrix, where 9 is real and the elements x, andy, are complex.

Plane rotations which vary the imaginary part of the inner product and allow the real part to remain constant may be defméd by: I cos 9 i sin 9Tx1 x2 x3 Ln° cos9][y1 y2 y3 in reference to operation on a 2 x n matrix, where 0 is real and the elements x1 andy' are complex.

Preferably said at least one control element is operable to commence calculation of the sign of the inner product of real parts of complex vectors in said first interleaved time phase and commence calculation of the sign of the inner product of real parts of complex vectors in said second interleaved time phase.

The applicant has observed that plane rotations can be performed so as to vary the real part of the inner product of two complex vectors independently of the imaginary part of those vectors. The applicant has also observed that calculation of the sign of the inner product of the vectors involves a finite time delay. This invention may minimise the impact of this time delay by using a pair of cross-coupled processing elements to perform time interleaved plane rotations effecting adjustments to the real and imaginary parts of the inner product of two complex vectors in distinct interleaved time phases.

Preferably the apparatus includes at least one scaling element adapted to accumulate a scaling factor corresponding to said degree of rotation and apply said scaling factor to said vectors after a number of iterations.

Preferably, a scaling element is adapted to accumulate a scaling factor equal to 11 cos wherein 9 represents the degree of rotation applied by a processing element in each given iteration j.

Preferably, said at least one control element is operable to calculate an inner product and output the value of a predetermined data bit which denotes the sign of the inner product.

Preferably, said at least one control element includes at least one array of multiplier elements provided for the outputs of said processing elements.

Preferably, said at least one control element includes at least one array of addition elements connected to said multiplier elements.

Preferably, said at least one control element includes two or more pipeline registers.

This may divide computation of inner products into two-time phases.

Preferably, the apparatus includes primary and secondary arrays of processing elements, operable to communicate with primary and secondary data stores, wherein each array is operable to perform operations on the respective data store, and wherein the secondary array is operable to perform substantially the same operations on the secondary data store as those performed by the primary array on the primary data store.

Preferably, the apparatus includes at least one initialisation element operable to initialise the primary and secondary data stores. Those skilled in the art will appreciate that the initialisation element may be provided by a variety of known means. These may include an additional data store, a data interface of a host processor.

Preferably, the apparatus includes at least one initialisation element operable to initialise the data store associated with the primary array with a given matrix. This matrix may be a Channel State Information (CSI) matrix characterising a MIMO system or another type of communication system.

Preferably the apparatus includes at least one initialisation element operable to initialise the secondary data store associated with an identity matri'x having dimensions corresponding to said given matrix. For example, if a matrix characterising a MIMO system is 4x4, the secondary data store would be characterised by the 4x4 identity matrix.

This invention allows the set of operations of the primary array to be captured by the secondary array acting on an identity matrix. The primary array allows the matrix A to be orthogonalised and the second array will capture these operations as a matrix transform. This transform may be a right or left side transform.

In another aspect, the invention is a method of diagnosing a multiple antenna transmission system by determining a transmission matrix characterising the system and performing a decomposition of the matrix to determine diagonal values of a diagonal matrix provided by said singular value decomposition, said singular value decomposition being performed by applying transformations to orthogonalise at least first and second component vectors of the transmission matrix, wherein the method includes the steps of: iteratively applying plane rotations according to a sense of rotation and a degree of rotation to corresponding elements of two vectors selected from the matrix to adjust the inner product of the two vectors; calculating the sign of the inner product of said first and second vectors; wherein the sense of rotation applied is determined by the sign of the inner product.

The plane rotations may be applied as binary shift and add functions.

In a further aspect of the invention there is provided a method of controlling a multiple antenna transmission system by determining a transmission matrix characterising the system and performing a decomposition of the matrix to determine a transformation matrix operable on a vector characterising symbols to be transmitted to provide a resultant transmission matrix with substantially orthogonal component vectors, wherein the method includes performing a singular value decomposition including the steps of: iteratively applying plane rotations according to a' sense of rotation and a degree of rotation to corresponding elements of two vectors selected from the matrix to adjust the inner product of the vectors; and calculating the sign of the inner product of said first and second vectors between iterations; wherein the sense of rotation is determined by the sign of the inner product of the two vectors.

A decomposition may be a singular value decomposition.

As used herein a matrix data store may be any data store operable to store a matrix.

The method may include applying an identical iterations of plane rotations to an identity matrix so as to recover a matrix transformation equivalent to the iterations of plane rotations.

In a further aspect of the invention there is provided a data processing method for adjusting the inner product of two vectors provided in a data store the method including the steps of: reading said vectors from said data store; iteratively applying plane rotations according to a sense of rotation and a rotation to corresponding elements of two vectors selected from the matrix to adjust the inner product of the two vectors; calculating the sign of the inner product of said first and second vectors between iterations; and writing said vectors to the data store after one or more iterations, wherein the sense of rotation is determined by the sign of the inner product of the two vectors.

Another aspect of the invention provides a data processing method for adjusting the inner product of two vectors provided in a data store, the method including the steps of: reading said vectors from said data store; applying shift and add functions so as to iterativel'y adjust the inner product of the two vectors according to a sense of rotation and a degree of rotation; calculating the sign of the inner product of said two vectors between iterations; and reversing the sense of rotation in response to said inner product reversing; and writing said vectors to the data store after one or more iterations.

Certain embodiments of the present invention may provide a data processing apparatus for implementing a reduced complexity process of adjusting the inner product of vectors.

Certain embodiments of the present invention may provide a data processing apparatus for implementing reduced complexity orthogonal transformations for vectors or matrices.

Certain embodiments of the invention may provide a data processing apparatus for implementing reduced complexity singular value or eigenvalue decompositions on matrices.

Certain embodiments of the present invention may provide a diagnostic method for multiple antenna systems which is suitable for reduced complexity processing resources.

Certain embodiments of the present invention may provide a control method for multiple antenna transmission systems which is suitable for reduced complexity processing resources.

Specific embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 depicts a plane rotation of a two-dimensional vector, which may be formed from corresponding elements of vectors; Figure 2 depicts plane rotations applied to pairs compristhg corresponding elements of a pair of vectors; ---Figure 3 is a schematic diagram of a device for applying plane rotations to corresponding elements of a pair of real vectors according to an embodiment of the present invention; Figure 4 is a schematic diagram of a CORDIC processing element according to a first embodiment of the present invention; Figure 5 is a schematic diagram of a control element of the vector rotation device of figure 3 for calculating the inner product of a pair of real vectors according to an embodiment of the present invention; Figure 6 is a schematic diagram of a vector rotation device in accordance with a second embodiment of the invention, the device being depicted with plane rotations being applied to a pair of complex vectors in a time resolved phase in which the real part of the inner product of a vector is evaluate& Figure 7 is a schematic diagram of the device illustrated in figure 6, in a time resolved phase in which the imaginary part of the inner product of the pair of vectors is evaluated; Figure 8 is a schematic diagram depicting a CORDIC processing element in accordance with an embodiment of the invention, the CORDIC processing element incorporating cross coupling; Figure 9 is a schematic diagram of a control element of the vector rotation device of figure 8 for calculating the inner product of a pair of complex vectors; and Figure 10 is a schematic diagram of a hardware implementation of a singular value decomposition of a 4x4 matrix according to an embodiment of the present invention.

In relation to a first described specific embodiment of the' invention, figure 3 depicts a hardware implementation of a device 10 operable to apply plane rotations. The device comprises an array 17 of CORDIC processing elements 4. These plane rotations adjust the inner product of a pair of vectors x andy.

Each CORDIC processing element 4 receives corresponding elements (marked as x,, y,) of a pair of vectors and applies plane rotations to the vector formed by those components. Rotated versions of x,, y, are computed by the CORDIC processing elements 4 and fed back internally as will be described in due course. The CORDIC processing elements 4 are connected to an inner product processing element 21 which computes the inner product of the vectors x andy from their collective outputs. The sign of this inner product is then fed back to the CORDIC processing elements 4 to control further rotations of the pairs of elements (x,, y,).

Figure 4 illustrates one of the CORDIC processing elements 4, in further detail. Those skilled in the art will recognise this as a device consisting of well-known hardware elements. The CORDIC processing element 4 has a pair of multiplexers 5 and 6 which provide access to registers 7 and 8 respectively. These registers 7 and 8 are directly connected to addition/subtraction elements 9 and 10 respectively. Although these registers are illustrated in figure 4 as being designated "xreg" and "yreg", they are not specifically configured to store respective x and y components and are in general multipurpose.

The registers 7 and 8 are connected indirectly to their opposite addition/subtraction elements 10 and 9, via corresponding right-shift elements 11 and 12. The addition/subtraction elements 9 and 10 feed back to the multiplexers 5 and 6 so that their respective results can be reiterated in a subsequent operation. The CORDIC processing element 4 also has a control input 13 which is connected to both addition/subtraction elements 9 and 10 to determine whether they add or subtract.

The addition/subtraction elements 9 and 10 are oppositely configured, with respect to the control input 13. That is, when addition/subtraction element 9 is configured to add its inputs, namely the contents of the xreg register 7 and the right shifted output of shift element 12, addition/subtraction element 10 is correspondingly configured to subtract the right shifted output of shift element 11 from the contents of the yreg register 8.

Those skilled in the art will notice that the CORDIC processing element 4 of Figure 4 does not comprise any multiplier elements. Implementation of multiplier elements is relatively complex compared to addition/subtraction elements, registers, and shift register elements shown in Figure 4, and can impose substantial demands on chip area in an integrated circuit.

A control signal input for the CORDIC processing elements 4, and specifically the addition/subtraction elements 9 and 10, of Figure 2 is provided using the output of the control element 21. The control input 13 of the CORDIC 4 receives the output of the control element 21 which determines the sense of rotation applied by the CORDIC processing element 4. The sense of rotation will, therefore, be determined by the sign of the inner product.

A simple example of a plane rotation of corresponding elements x andy of vectors (x,y) through an angle 8 can be described by the matrix equation rx'l cos& sin OTxl r 1 tan 61[x = I. Ii I = cos 91 II LY'i [-sinG cos OJyj tanG 1 JLY By composing a plane rotation from a sequence of elementary rotations with tan 6 the major part of the computation can be implemented using shift and add operations provided by the CORDIC processing element 4. This explains the significance of using shift elements 11, 12 in the CORDIC processing element 4. For an initial value of n0, tan 9 = I and so 9 = 45°. Then, successive angles of rotation expressed by values of n can be tabulated as: n 0 1 2 3 4 5 91° 45 27 14 7.2 3.6 1.8 Thus, it can be seen that, with the above successive rotatins available through simple right shift functions, the vectors formed by pairs of elements of x and y can be rotated to a point where the inner product of x and y is acceptably close to zero. What amounts to "acceptably close" will be implementation dependent -the device can in any case be configured to cease applying rotations when a minimum rotation angle has been reached. If, in accordance with the above tabulated data, a tolerance of 2 degrees is sufficient, then rotations can be terminated on completion of the sixth iteration, i.e. when n5.

The above equation shows that a scaling factor cos 9 needs to be accounted for, for each applied rotation, in order to finalise the vector transformation. Each application of a rotation involves multiplication by a scaling factor cos 9. Thus, if several rotations are applied, and in either direction, multiple scaling factors will be applied. It should also be noted that the value of 0 will vary per rotation, in accordance with increments of n in the above table. Thus, a scaling factor of the following form will arise: cos45°xcos27°xcosl4°x This scaling factor is accumulated over repeated elementary rotations and applied at a last stage of the computation. Although calculation of the scaling factor is thus possibly non-trivial, it can be performed off line if the sequence of values of 0 is known. This is the case if the sequence n0, 1,2. . . n is established beforehand. Thus, if the number of applied rotations is known, and successive rotations correspond to successive values in the series n=0, 1..., then a corresponding sequence of scaling factors can be determined before use of the device and used on a look-up basis by the device. A simple multiplication element can be used (such as shift and add) which can apply the scaling factor without complicating the device unduly. The scaling factor can with be looked up, or can be a constant if the number of applied rotations is a constant. The scaling factor can be applied off line, and indeed outside the confmes of the CORDIC processor implementing the rotations. It can be applied by another device such as that instructing the CORDIC processor to perform the rotations.

It is also worth observing that, with increasing number of applied iterations, the scaling factor will tend towards approximately 0.607, so this scaling factor could be used universally if at least a sufficient number of iterations will always be performed.

The manner in which the device is used in practice will now be explained. Initial signals x, y are presented to the respective CORDIC processing elements 4. On the basis of rotations applied by the CORDIC processing elements 4, under the control of the control element 21, transformed versions of the signal vectors x, y are gradually formed. The CORDIC processing element 4 applies a plane rotation to the elements selected at step 41. This plane rotation is applied according to a degree of rotation selected from a sequence defined by arctan(2). The initial rotation is arctan 2° which is 45 degrees.

The control element 21 computes the sign of the inner product of the interim values of the rotated first vectors, indicated as x, y with elements x1 and y1 respectively at each CORDIC processing element 4. The signal output by the control element is indicative of the sign of the computed inner product and this signal is applied to the input 13 of the CORDIC processing element 4. This control signal determines the sense' of the next plane rotation to be applied. The control signal specifies whether the elements 9 and 10 are to perform addition or subtraction operations. These elements can be configured either to add or to subtract -it will be appreciated that, when one of elements 9 and 10 is configured to add, the other is correspondingly configured to subtract.

If the sign of the inner product changes from one iteration to the next (which is indicative from the control signal), this signifies that the most recent rotation caused the inner product to pass through a zero point. It is thus appropriate to reverse the sense of rotation in the next iteration. The control element 21 causes the CORDIC processing element 4 in such circumstances to apply an opposite sense of rotation in the next iteration by means of the control signal supplied at input 13. As indicated in figure 4, the same control is applied to both addition/subtraction elements 9 and 10. An appropriate control input will be apparent to those skilled in the art being anything which determines the sense of rotation applied by the CORDIC processing element 4 to change the sense of rotation applied for a subsequent iteration if the sign of the inner product changes.

Figure 5 is a schematic diagram depicting part of the control element 21 which calculates the sign of the inner product of two vectors to provide a control for the addition/subtraction elements of the CORDIC processing element 4. The control element 21 has an array of multiplier elements 111. These multiplier elements each receive two outputs (xí,y1) from a respective CORDIC processing element 4. Respective pairs of multiplier elements 111 are connected to addition elements 112, followed by further successive rows of addition elements 113, 114. The addition elements 112, 113 and 114 thus establish an adder tree, which is a structure which will be known by the skilled person as an efficient way of adding many terms together simultaneously. The numbers of elements and layers of elements required for various sizes and types of vectors will be apparent to those skilled in the art.

The structure of the control element 21 will be understood by the skilled reader to be a straightforward way of embodying in hardware the calculation of an inner product.

Once the control element 21 has determined the sign of the inner product, and this has been redirected to the CORDIC processing elements, their addition/subtraction elements 9 and 10, appropriately configured by control signal 13, compute the next set of values of x and y. The new values represent a plane rotated' vector of the form (x,y).

Although not specifically illustrated, the maimer in which iteration is terminated should be considered. On the one hand, it would be possible to perform a fixed number of iterations on each occasion -as can be seen above, the sequence based on arctan 2 does rapidly converge towards zero and so after a few iterations, only minor adjustments to orthogonality would be implied by such small rotations.

On the other hand, it could also be possible to incorporate in the control element 21 a comparator of the inner product with a threshold-if the magnitude of the inner product is beneath that threshold then the control element would cease operation of the CORDIC processing elements and the final values of x'1 and y" per CORDIC processing elements would be adopted as the output from the device.

Generally, therefore, the iterations can be said to terminate when the pair of vectors are sufficiently orthogonal, wherein the question as to whether the orthogonality is sufficient is determined either by assumption on design of the device or by measurement during operation.

Each iteration carried out by the CORDIC processing elements 4 involves the next degree of rotation selected from the sequence described above. The implementation of the sequence is by way of shift operations performed by the shift operators 11, 12. The sequence is defined by arctan 2, where n is a positive integer which increases along the sequence. The length of the sequence arctan 2 can be used to determine the precision of orthogonalisation. In some embodiments, the sequence may conceivably be relatively short, and the inner product of the vectors will only be partially adjusted towards 0.

This allows a trade-off between precision and processor speed or processor complexity.

This trade-off can be predetermined or adjusted dynamically, as discussed above.

Once the CORDIC processing elements 4 have completed their iterations, then the two resultant vectors are multiplied by the relevant scaling factor for the number of rotations that have been applied in the various iterations of operation of the CORDIC processing elements. This is simply looked up as noted previously and used to govern the operation of a shift and add multiplier as previously described.

Operation of the device terminates with the pair of vectors orthogonal, or approximately orthogonal, and scaled to substantially their original magnitudes. It is contemplated that the device can be used as a type of co-processor; that is providing a specialised calculation facility which can be used as required by a more general purpose data processing apparatus.

While the above essentially makes the assumption that the vectors to be rendered mutually orthogonal are real, a further embodiment of the invention will now be described which is suitable to the situation wherein elements of the vectors have complex values.

In such a situation, it is convenient to treat the two parts of the vectors (the Real and Imaginary parts) as separate problems to be solved. This can be done as is explained in the following paragraphs.

The inner product x.y of two complex vectors x and y is defined by xy where x is the conjugate transpose of x x.y = Xiyj + X2Y2 + X33 + + Considering a single component of this sum: ifx1 t+iu and y1v+iw Then x1y1 = (t -iu)(v + iw) = (tv + uw) + i(tw -uv) Under the unitary transformation r. . ..i I 2 V 1LA V [y; y y] [-sin cos y2 which describes a rotation by angle 6 in the real x, y plane, XI' = x1cosO + yjsinO = [t + iu]cosO + [v + iw]sinO = cosO{[t + v tanO] + i[u + w tan8]} yl' = y1cosO -x1sin6 [v + iw]cosB -[t + iu] sin8 = cosO{[v -t te] + i[w -u tanO]} Thus, the inner product: xi'yi' = (x1cos + yisin8) (y1cose -x1sin0) ([t + iu]cosB + [v + iw]sine)* ([v + iw]cosO -[t + iu]sine) = ([t cosO + v sinO] -i[U cose+w sin8])([v cosO -, t sinO] + i[w cosO -u sinO]) = ([t cosO + v sine] [v cose -t sine] + [u cose+w sin8] [w cose -u sinO]) + i([t cose + v sine] [w cose -u sinO] -[u cose+w sinO] [v cosO -t sine]) [tv + uw] [cos2O -sin20] + [v2 -t2 + w2 112] cosO sinO + i([tw-uv] [cos20+sin20]) = [tv + uw] cos29 + V2[v2 -t2 + w2 -U2] sin2e + i(tw -uv) From the above, the reader will appreciate that the imaginary part of the inner product is unaffected by the rotation, while the real part varies sinusoidally at twice the frequency of the applied rotation. That is, as e varies through 90°, the real part of the inner product varies sinusoidally through 1800 and therefore includes at least one zero.

Through such rotation (which can be termed the R-phase), control is provided by (tv + uw) summed with similar products, i.e. > {Re(x1).Re(y) + Im(x1).Im(y1)}.

Under the unitary transformation Ix; ; 1 -I o, 2 [y y yj -Lisine cosej y1 y2 y which describes rotation in the imaginary x, y plane, x1' = x1cosO + iy1sinO = [t + iu]cose + i[v + iw]sinO = cos8{[t -w tanO] + i[u + v tane]} Yi' = y1cose + ix1sine = [v + iw]cose + i[t + iu] sine = cose{[v -u tan8} + i[w + t tan8]} The inner product of the vectors rotated in the imaginary plane is expressed by: xl'yi' (x1cose + iy1sine) (y1cos9 + ixisine) = ([t + iu]cose + i[v + iw]sinO) ([v + iw]cosO + i[t + iu]sine) = ([t cosO -w sinO] -i[u cosO + v sinB])([v cosO -u sine] + i[w cos9 + t sine]) = ([t cosO -w sine] [v cose -u sinO] + [u cosB+v sine] [w cos9 + t sine]) + i([t cos8 -w sine] [w cosO + t sinO] -[u cosO+v sine] [v cosO -u sine]) = [tv + uw][cos28 + sin2e] + i([tw -uv] [cos2O -sin2O] -[v2 -t2 + w2 -u2] cosO sinG) = [tv + uw] + i([tw -uv] cos2G -/2[V2 -t2 + w2 -u2] sin2O) In this case, it is the real part which is unchanged and the imaginary part which varies sinusoidally at twice the frequency of the applied rotation. In this so called I-phase, control is provided by (tw -uv) summed with similar products, i.e. {Re(x).Im(y1) -Im(x1).Re(y1)}.

In figure 6, the CORDIC processing elements 4 are configured to deliver output signals to the control element 21 to determine changes in the sign of the real part of the inner product. That is, CORDIC processing elements 4 are arranged in successive pairs, the left hand of which is designated to determine rotation of the real part of a 2-vector defined as (x1, y1), the right hand of the pair being designated to determine rotation of the corresponding imaginary part of the 2-vector.

According to this arrangement, therefore, it will be understood that twice as many CORDIC processing elements 4 are provided as the dimension of the input vectors x and y. For example, if, in figure 6, n4, then eight CORDIC processing elements 4 are provided.

Changes in the real part of the inner product presented by rotations imposed by this bank of CORDIC processing elements 4 to signals introduced at their inputs as so indicated will signify passage over a zero crossing point of the real part of the inner product. According to the above analysis, in the R-phase the imaginary part of the inner product will not vary.

Once rotation in the R-phase has been completed satisfactorily, the inputs to the CORDJC processing elements 4 are reconfigured as shown in figure 7, for the I-phase.

In this configuration, the real part of the x element of the resented 2-vector is presented to one of the inputs of the left hand CORDIC processing element in such pairs, while the imaginary part of the y element is presented to the other input of that CORDIC processing element. Conversely, the imaginary part of the x element is presented to one of the inputs of the right hand CORDIC processing element of the pair, and the real part of the y element is presented to the other input of the right hand CORDIC processing element. This corresponds to the cross multiplication performed, as described above, in determining an imaginary part of an inner product of two complex vectors. Rotations are then governed by the imaginary part of the inner product determined in the control element 21.

Although the above arrangement operates perfectly satisfactorily, a further embodiment takes advantage of performance efficiencies and eliminates a possible source of delay in arriving at a result.

Figure 8 depicts this further embodiment of a CORDIC processing element 304, which includes a cross coupled structure. Many aspects of the function of the illustrated CORDIC processing element 304 are shared with those of the CORDIC processing element of figure 4, but additional cross coupling provides enhancements over the use of a pair of the previously described CORDIC processing elements 4.

The CORDIC processing element 304 comprises first, second, third and fourth inputs, operable to receive input signals corresponding to vectors to be rotated. These inputs are to respective multiplexers 311, 312, 313, 314 which switch between data entry in the form of input signals as mentioned, and feeding back interim values from use of the CORDIC processing element 304. The signal output from the multiplexers 311-314 are fed to respective registers 321, 322, 323, 324. In contrast to those used in CORDIC processing element 4 previously described, these registers are marked "regi" and so on in figure 8 to emphasise the fact that the CORDIC processing element 304 is relatively flexibly used.

As previously, the output from each register is fed forward to a respective addition/subtraction element 341, 342, 343, 344. Each respective addition/subtraction element is also configured to receive an output of a right hift element coupled with an adjacent register, that is: addition/subtraction element 341 receives output of a shift element 332 taking input from register 322 (reg2) addition/subtraction element 342 receives output of a shift element 331 taking input from register 321 (regi); addition/subtraction element 343 receives output of a shift element 334 taking input from register 324 (reg4); and addition/subtraction element 344 receives output of a shift element 333 taking input from register 323 (reg3).

As before, addition/subtraction element 341 is operable to add when addition/subtraction element 342 is operable to subtract, and addition/subtraction element 343 is operable to add when addition/subtraction element 344 is operable to subtract.

Further cross coupling is exhibited in the present embodiment. The output from addition/subtraction element 341 (ri') is fed back to its own multiplexer 311; likewise the output from addition/subtraction element 344 (r4') is fed back to multiplexer 314.

On the other hand, the output from addition/subtraction element 342 (r2') is fed to multiplexer 313, and the output from addition/subtraction element 343 (r3') is fed to multiplexer 312.

An R-phase is commenced by loading the registers with values as follows: reg I reg 2 reg 3 reg 4 Re(xi) Re(yi) Im(yi) Im(xi) Then, after one processing phase, the values of these inputs are presented back to the multiplexers so that, in essence, the following data is available to the CORDIC processing engine 304 in the next cycle: reg I reg 2 reg 3 reg 4 Re(xi)' Im(yj)' Re(yi)' Im(xj)' From the analysis set out above, it will be seen that this presents the data in a form suitable for the cross multiplication associated with calculation of inner products for the I-phase.

Thus, the CORDIC processing element 304 is suitably configured for determination of R-phase and I-phase calculations in alternating phases.

The control element 221 illustrated in figure 9 is suitably used with this CORDIC control element to provide control signals to the addition/subtraction elements 341-344.

The control element 221 includes pipeline registers 220 to split the calculation of the inner product into distinct time phases. This is appropriate because the calculation of the inner products can take some time, in comparison with the rotation operations of the CORDIC processing elements 304. Thus, by pipelining the inner product calculations, the CORDIC processing elements 304 can alternate between R-phase and I-phase operations. The position of the pipeline registers 220 is conveniently determined so that the processing delay above the registers 220 is substantially the same as that below. In this way, the calculations associated with R-phase operation can be ready for deployment at the next R-phase operation, while the I-phase calculations are being progressed through the top half of the control element 221.

The device so illustrated in figures 8 and 9 is capable of processing a complex vector in alternating real and imaginary time phases.

In a first time slot, designated as an R-phase, calculation commences of the imaginary part of the inner product of the vectors to be treated. At the same time, the CCPEs are determining data for rotation in the real plane.

In a next time slot, designated an I-phase, calculation of the imaginary part of the inner product completes. In the same I-phase, the imaginary part of the inner product is used to control rotation of the vectors in the imaginary plane. Calculation of the real part of the inner product also commences in that I-phase, on the basis of the data generated in the previous slot.

In the next time slot, which is the next R-phase, calculation of the real part of the inner product completes and is used to control rotation of the vectors in the real plane.

The inner product is calculated using the control element 221 shown in Figure 8. In this case, commencement and completion of the calculation can be understood to correspond to operation before and after the pipeline registers 220.

It will be understood that when the above described CORDIC array is used to operate on complex vectors in alternate and interleaved time phases, initialisation may give rise to a challenge. The challenge arises from the two-stage pipeline of 221. When the CORDTC registers are first loaded, the addition/subtraction control inputs receive an arbitrary value during the first phase and the sense of rotation is, therefore, unpredictable. Various techniques can be applied to overcome this problem, as will be apparent to those skilled in the art.

The portions of the CORDIC processing element so described can share a single controller to calculate the inner product because operations may be applied which alter the real and imaginary parts of the inner product of vectors independently. Sharing a control element allows complexity to be reduced further with CORDIC operations being applied in interleaved time intervals. By interleaving, although just as many CORDIC cycles are required as for the non-cross-coupled CORDIC processing elements 4 illustrated in figures 6 and 7, the device can make use of a more efficient control path such that the device is not waiting' for control signals to be derived from interim inner product determinations per cycle. Instead, inner product determinations are pipelined, allowing a substantial increase in processing speed by overlapping the inner product determination with the CORD1C implemented rotation.

Figure 10 depicts a device 26 for capturing orthogonal transformations of 4x4 matrices according to an additional aspect of the present invention. Those skilled in the art will recognise that capturing orthogonal transformations facilitates singular value decompositions SVDs, for example. Therefore, the device 26 will be referred to as a SVD processor'. This SVD processor 26 has a data store referred to here as a UH register file 27, which is initialised with an identity matrix. It will be apparent to those skilled in the art that the register file 27 may be designated other than UH, for alternative applications of the processor 26. The SVD processor 26 also has an A register file 28 which is initialised with a matrix A to be treated. If, for example, the SVD processor 26 is to be applied to MIMO system diagnostics or control, the A matrix stored in the A register file 28 would be the CSI or H matrix of the MIIvIO system.

The SYD processor 26 also has two arrays 29 and 30 of CORDIC processing elements 29a-29d and 30a-30d. Each of these cross-coupled processing elements 29a-30d corresponds to a CORDIC processing element 304 such as depicted in Figure 9. For the purpose of illustration, the first array of CORDIC processing elements 29 will be referreti to as the uH processing array and the second array of cross-coupled processing elements 30 will be referred to as the A processing array.

The UH array 29 is connected to the UH register 27 so as to read from and update the UH register 27. Similarly, the A processing array 30 is connected so as to read from and update the A register 28.

The SVD processor 26 also has a controller 221 (as illustrated in figure 9) which computes the sign of the inner product of vectors output from the A processing array 28.

The SVD processor 26 is also in communication with a host processor (not shown). In use, the host processor causes the A processing array 30 to read a selected pair of vectors from the A register file 28 and iteratively apply plane rotations to pairs of vectors to rotate them relative to each other according to a sense and degree. These vectors are selected according to the method described in the paper by Hestenes, referenced.

Between each iteration, the controller 221 computes the sign of the inner product of the vector output by the A processing array 30 and provides a control signal to the cross-coupled processing elements 29a- 29d of the UH array 29 and 30a-30d of the A processing array 30. The control signal and the configuration of the cross-coupled CORDIC processing element 30a-30d determines the sense of rotation according to the sign of the inner product.

During this process, the host processor (not shown) causes the UH processing array 29 to perform the same operations as the A processing array 30. These operations are performed on the identity matrix with which the UH register file 27 is initialised.

Those skilled in the art may identify the single value decomposition processor depicted in Figure 10 as being a form of co-processor accelerator. The SVD processor 26 utilises low complexity, CORDIC processors to perform SVDs and capture orthogonal transforms.

The utility of the singular value decomposition processor 26 is illustrated by reference to the following definition of a singular value decomposition, known to those skilled in the art.

UHA = EVH According to this definition, VH is a matrix which has orthogonal rows.

The SVD processor 26 is to be used repeatedly on different selections of vectors selected from the A register file, according to Hestenes' method, to orthogonalise those vectors relative to each other. This operation will result in EVH or, by definition, UHA.

The set of unitary operations which result in VH, a matrix with mutually orthogonal constituent vectors, is characterised by UH. The S\TD processor 26 captures these operations the UH array performing operations on an identity matrix that are identical to those performed on A by the array 30. The UH register file provides a matrix transformation which is operable on a vector characterising symbols to be transmitted.

Those skilled in the art will appreciate that as the processor 26 captures orthogonal transforms, it can be applied to the problem of performing EVDs. Analogous apparatus for treating matrices of other dimensions will be apparent to those skilled in the art.

They may, for example, include additional cross-coupled CORDIC Processing Elements (CCPEs).

A 4x4 matrix such as a CSI matrix of a MIIvIO system will be used to demonstrate operation of the processor 26. The processor 26 will provide a transform matrix for preconditioning the vector characterising symbols to be transmitted. Application to other matrices in the same or other technical areas will also be apparent to those skilled in the art.

A transmission matrix A is used to initialise the A register 28 and an identity matrix is used to initialise the U' register 27. A pair of rows of the matrix A are selected from the A register 28. Typically, although not exclusively, complex matrices are processed by the SVD processor 26. Hestenes' method provides a known method of selecting pairs of vectors from a matrix and orthogonalising them to orthogonalise the matrix.

The A array 29 adjusts the inner product of the vectors towards orthogonality. As previously discussed, the cross-coupled CORDIC processing elements (CCPE) process the real and imaginary parts of the inner product of these vectors independently in alternate interleaved time slots. The control element 221 computes the sign of the inner product of the vectors processed by the A processing array 30 and provides a control signal for the CCPEs 304. As discussed in reference to a single CORDIC processing element, the control signal provided by the controller 221 causes the sense of the rotations applied by the A processing array 30 to be determined by the sign of the inner products. The sense of rotation is determined by the size of the inner product. If an iteration of plane rotations causes the sign of the inner product to reverse, the next iteration will apply plane rotations with reversed sense of rotation.

Exactly the same rotations are applied to corresponding vectors selected from the UH register 27. These operations are repeated until the selected rows are orthogonal or approximately orthogonal.

A decision needs to determine how many iterations are carried out. In practice, the decision may be predetermined by the length of a sequence from which the CCPEs 304 draw the degrees of rotation which they are to apply. Any alternative means known to those skilled in the art may be used in place of the length of a sequence these means may include reference to an error value for the magnitude of the inner product of vectors.

Then, either the above operations are repeated and a new pair of rows of A are selected according to Hestenes' method or the process terminates. It will be apparent to those skilled in the art that if Hestenes' method is followed, eventually, each vector will be orthogonal to each other vector in the matrix A. At this point, the A register 28 will contain EV and the UH register 27 will contain UH.

Since A is already known, by reference to a suitable definition of a singular value decomposition (SVD), it will be apparent to those skilled in the art that the register files now store all of the parts necessary to determine the SVD.

It will be appreciated that a processor in this form can be used as part of a larger MIMO system, for instance to determine a precoding matrix for a given CSI matrix. Moreover, the processor can be used to apply SVD to a matrix in the determination of a channel estimate in a MIMO system.

The embodiments of the invention described herein provide devices from which multiplier elements have been largely eliminated or at least minimised in their use.

Therefore, the embodiments of the invention provides low complexity devices, orthogonalising vectors, performing SVDs or EVDs, or diagnosing or controlling multiple antenna transmission systems.

The reader will appreciate that the foregoing is but one example of implementation of the present invention, in that further aspects, variations and advantages may arise from using the invention in different embodiments. The scope of protection is intended to be provided by the claims appended hereto, which are to be interpreted in the light of the description with reference to the drawings and not to be limited to the above.

Claims

CLAIMS: 1. A data processing apparatus for processing data defining first and second vectors, the apparatus being operable to adjust the inner product of the vectors, said apparatus comprising: an array of processing elements operable to apply iteratively plane rotations to pairs of corresponding components of the first and second vectors; and a control element operable to calculate the inner product of said first and second vectors following processing by said array, and to provide to the array a control signal based on said inner product; wherein the array oIprocessing elements is operable to receive the control signal and to apply successively smaller rotations with the sense of each rotation determined according to the control signal.
2. Apparatus in accordance with claim 1, wherein each of said successively smaller rotations is defined by arctan2 wherein n is a whole number incremented per rotation.
3. Apparatus in accordance with claim 2, wherein the array of processing elements is operable to cease iteration after a predetermined number of degrees of rotation from said sequence has been applied.
4. Apparatus in accordance with claim 2 wherein the array of processing elements is operable to cease iteration when said inner product is below a predetermined threshold.
5. Apparatus in accordance with any one of the preceding claims wherein the control element is operable to determine a control signal bearing information on the sign of the inner product.
6. Apparatus in accordance with any one of the preceding claims, wherein each processing element is operable to receive two signals as inputs and provide two signals outputs.
7. Apparatus in accordance with claim 6, wherein each processing element includes first and second registers each communicating substantially directly with a respective additjonJsubtraction element and each communicating indirectly with the respective addition/subtraction element of the other register through a right shift element, wherein each addition/subtraction element is operable to receive the control signal and either add or subtract according to the control signal and wherein said addition/subtraction elements are configured such that when one is configured to add, the other is configured to subtract.
8. Apparatus in accordance with any preceding claim and operable to process signals representative of complex valued vectors, said apparatus being operable in a first phase in which signals are processed from which an a real part of said inner product can be derived and a second phase in which signals are processed from which an imaginary part of said inner product can be derived.
9. Apparatus in accordance with claim 8 and wherein said control element comprises information storage means operable to enable said control element to establish a pipeline of inner product determinations.
10. Apparatus in accordance with claim 10 wherein said apparatus is operable alternately in said first and second phases.
11. A CORDIC processing element operable to apply rotational transformations to pairs of corresponding real and imaginary components of first and second vectors defined in complex space, wherein said CORDIC processing element comprises first, second, third and fourth registers each communicating substantially directly with a respective addition/subtraction element and each communicating indirectly with the respective addition/subtraction element of another register through a right shift element, the first and second addition/subtraction element and said third and said fourth addition/subtraction element being so arranged in respective cross coupled pairs, wherein each addition/subtraction element is operable to receive the control signal and either add or subtract according to the control signal and wherein said addition/subtraction elements in said pairs are configured such that when one is configured to add, the other is configured to subtract, and wherein outputs of said first and fourth additionlsubtraction elements are configured to feed back respectively to said first and fourth registers and said second and third addition/subtraction elements are cross coupled back respectively to said third and second registers so that real and imaginary parts of rotated vector components can be calculated in said one device in successive phases of operation.
12. A CORDIC processing element in accordance with claim 11 and including first, second, third and fourth multiplexers through which corresponding registers are operable to receive initial data.
13. A CORDIC processing element in accordance with claim 12 wherein said initial data comprises first, second, third and fourth data elements being real and imaginary parts of corresponding scalar quantities to be processed.
14. A CORDIC processing element in accordance with claim 13 wherein said first element comprises a real part of a first scalar quantity, said second element comprises a real part of a second scalar quantity, said third element comprises an imaginary part of said second scalar quantity, and said fourth element comprises an imaginary part of said first scalar quantity.
15. A CORDIC processing element in accordance with claim 14 and operable to output respective first, second, third and fourth data elements defining real and imaginary parts of said first and second scalar quantities, following rotation.
16. Data processing apparatus comprising a plurality of CORDIC processing elements in accordance with any one of claims 11 to 15, each being operable on corresponding elements of first and second vectors, and a control element operable to determine respective real and imaginary phase control signals, on the basis of an inner product of said vectors after said successive operations, for said CORDIC processing elements.
17. Data processing apparatus in accordance with claim 16 wherein said control element comprises pipelining means operable to enable a pipelined determination of successive control signals for successive real and imaginary phases of said CORDIC processing elements.
18. Matrix information processing apparatus comprising first and second arrays each comprising a plurality of CORDIC processing elements in accordance with any one of claims 11 to 15, first and second matrix register means operable to store information defining a matrix to be processed by said first and second arrays, and row selection means operable to select two rows of a matrix for processing, and control means operable to generate control signals on the basis of operation of said first array, said control signals being operable to control both said first and second arrays of CORDIC processing elements.