CN100383781C

CN100383781C - Cholesky decomposition algorithm device

Info

Publication number: CN100383781C
Application number: CNB2005101241769A
Authority: CN
Inventors: 冉静; 刘昕
Original assignee: Beijing T3G Technology Co Ltd
Current assignee: Beijing T3G Technology Co Ltd
Priority date: 2004-11-26
Filing date: 2005-11-21
Publication date: 2008-04-23
Anticipated expiration: 2025-11-21
Also published as: CN1783060A

Abstract

The present invention relates to a Cholesky decomposition algorithm device which comprises a memory, at least two component calculation units, a multiplexer, a normalization unit, a reciprocal square root unit and a diagonal multiplier, wherein the memory is used for storing the matrix data of a specific line calculated by the Cholesky decomposition algorithm device for inputting subsequent calculation; the component calculation units are used for carrying out component calculation on matrix data externally inputted by the matrix data stored in the memory; the multiplexer is used for selecting and outputting a component calculation result; the normalization unit is used for the normalization treatment of the component calculation result outputted by the multiplexer; the reciprocal square root unit is used for calculating the square root and the reciprocal of a normalized component for obtaining the square root and a reciprocal square root; a specific reciprocal square root is locked; the diagonal multiplier is used for multiplying the normalized component and the locked reciprocal square root and outputting the calculation result to the memory. Because the Cholesky decomposition algorithm device can process the matrix data parallel, digital signal processing speed is obviously enhanced.

Description

Cholesky decomposition algorithm device

Technical field

The invention relates to a kind of decomposition algorithm device, particularly about a kind of matrix cholesky (Cholesky) decomposition algorithm device.

Background technology

In digital processing field, very many about the computing of matrix, particularly in order to simplify difficulty in computation, it is very conventional disposal route that matrix is carried out modification, simplification and decomposition.

Cholesky decomposition algorithm is exactly very common a kind of matrix disassembling method, and its ultimate principle is: for a n rank symmetric positive definite matrix A, have a lower triangular matrix L, make A=LL ^H, the number on the L diagonal of a matrix all is an arithmetic number, L ^HThe associate matrix of expression lower triangular matrix L:

A = |\begin{matrix} a_{11} & a_{12} & . . . & a_{1 n} \\ a_{21} & a_{22} & . . . & a_{2 n} \\ . . . & . . . & . . . & . . . \\ a_{n 1} & a_{n 2} & . . . & a_{nn} \end{matrix}| = L \cdot L^{H} = |\begin{matrix} l_{11} & 0 & . . . & 0 \\ l_{21} & l_{22} & . . . & 0 \\ . . . & . . . & . . . & . . . \\ l_{n 1} & l_{n 2} & . . . & l_{nn} \end{matrix}| \cdot |\begin{matrix} l_{11} & l_{21}^{*} & . . . & l_{n} & 1_{*} \\ 0 & l_{22} & . . . & l_{n} & 2_{*} \\ . . . & . . . & . . . & . . . \\ 0 & . . . & 0 & l_{nn} \end{matrix}| - - - (1)

The basic calculating formula of cholesky decomposition algorithm is:

l_{jj} = {(a_{jj} - Σ_{p = 1}^{j - 1} l_{jp} l_{jp}^{*})}^{1 / 2} - - - (2)

l_{ij} = (a_{ij} - Σ_{p = 1}^{j - 1} l_{ip} l_{jp}^{*}) / l_{jj} - - - (3)

J=1 wherein, 2 ..., n; I=j+1 ..., n; l _JjAnd l _IjInitial value be:

l_{11} = a_{11}^{1 / 2}

And l _I1=a _I1/ l ₁₁(i=2,3 ..., n).

Ultimate principle according to cholesky decomposition algorithm, the every bit of matrix L all is that recurrence is tried to achieve, relevant with the data of its former row and row, general computing method are to carry out computing line by line at the L matrix, the operation time of Xiao Haoing is bigger like this, particularly when n was bigger, increased greatly operation time especially.

Equally, the pairing calculation device of above method is because it need carry out computing line by line to the L matrix, and therefore, this calculation device also needs the long time to handle such matrix operation.This can cause the speed of digital signal processing to slow down, thereby is not suitable for the digital signal processing in the communication of two-forty or high information quantity.

Summary of the invention

For overcoming the existing slow-footed problem of cholesky decomposition algorithm device digital signal processing, the invention provides the fireballing cholesky decomposition algorithm device of a kind of digital signal processing.

For above-mentioned purpose, cholesky decomposition algorithm device of the present invention comprises: a storer in order to store the matrix data of the particular column that is calculated by this cholesky decomposition algorithm device, is used for the input as subsequent calculations; At least two component calculation unit, the matrix data of the outside being imported in order to the matrix data that utilizes above-mentioned memory stores carries out component calculating; One multiplexer, it selects the component result of calculation of output component computing unit; One normalization unit, its component result of calculation with multiplexer output is carried out normalized; One square root reciprocal unit, its component with normalized carry out extraction of square root and ask reciprocal and handle, and obtain its square root and square root reciprocal, and lock this inverse square root; One diagonal angle multiplier, it carries out multiplying with the component of normalized and the square root reciprocal of locking, and exports its result of calculation to this storer.

Its beneficial effect is, because cholesky decomposition algorithm device can the parallel processing matrix data, can obviously improve digital signal processing speed.

Description of drawings

For further understanding the present invention, please refer to accompanying drawing described below:

Figure 1 shows that the structured flowchart of cholesky decomposition algorithm device of the present invention.

Figure 2 shows that and adopt u component calculation unit to carry out cholesky decomposition algorithm device and operation steps thereof that component calculates.

Embodiment

With the example that is applied as in the joint detection algorithm of TD SDMA (TD-SCDMA) terminal device (UE), describe cholesky decomposition algorithm device of the present invention in detail below.

The algorithm basis

It is a variety of to realize that for terminal device the algorithm of joint-detection has, but mainly is based on " ZF-piecemeal linear equalizer " this equalization algorithm development and comes, and its principle is exactly by utilizing system matrix A and noise matrix R _nThe data symbol vectors of coming estimating user to send

This algorithmic notation is:

\hat{d} = {(A^{H} \cdot {R_{n}}^{- 1} \cdot A)}^{- 1} \cdot A^{H} \cdot {R_{n}}^{- 1} \cdot e - - - (4)

Wherein, system matrix A is by K user's spreading code and channel impulse response decision; D is the data symbol that transmitting terminal K user sends; E represents receiving data sequence.The difficulty that realizes this algorithm is how to the positive definite matrix (A of system ^HR _n ^-1A) invert.

Utilize in the present invention just to matrix (A ^HR _n ^-1A) carry out cholesky and decompose, formula (4) can be expressed as:

\hat{d} = {L^{- 1} \cdot (L^{H})}^{- 1} \cdot A^{H} \cdot {R_{n}}^{- 1} \cdot e - - - (5)

For (1) formula, at first establish m _Ij ^kBe l _IjCalculate component the k time:

m_{ij}^{k} = m_{ij}^{k - 1} - l_{ik} l_{jk}^{*}, (k = 1,2, . . ., j - 1) - - - (6)

J=1 wherein, 2 ..., n; I=j+1 ..., n.m _Ij ^kInitial value be:

m_{ij}^{1} = a_{ij} - l_{i 1} l_{j}^{1} .

By (2), (3) and (6) formula, the element l of matrix L _JjAnd l _IjCan calculate component m by the j-1 time _Ij ^J-1Calculate:

l_{jj} = {(m_{jj}^{j - 1})}^{1 / 2} - - - (7)

l_{ij} = (m_{ij}^{j - 1}) / l_{jj} - - - (8)

Its initial value is:

l_{11} = a_{11}^{1 / 2}

And l _I1=a _I1/ l ₁₁(i=2,3 ..., n).

Cholesky decomposition algorithm device

Please refer to Fig. 1, is a kind of cholesky decomposition algorithm device provided by the present invention, and it is applied in the joint-detection of terminal device of TD SDMA.This cholesky decomposition algorithm device can carry out cholesky to the matrix of a maximum 32 * 32 and decompose, that is, this matrix can be expressed as the form of (1) formula, just one 32 rank symmetric positive definite matrix A.For this matrix A, there is a lower triangular matrix L in it, makes A=LL ^H, the number on the L diagonal of a matrix all is an arithmetic number.The element l of matrix L _JjAnd l _IjCan calculate component m by the j-1 time _Ij ^J-1Calculate, as (7) and (8) formula.

This cholesky decomposition algorithm device 10 comprises: storer 11, two parallel component calculation unit 12, a

multiplexer

13,14, one square root reciprocal unit 15, a normalization unit and diagonal angle multipliers 16.

In the present embodiment, the amount of capacity of this storer 11 is 64 * 32 bits, and storer is used for storing the l of previous column _IkThe l that has just calculated with these row _{I, (k+1)}, with as calculating m _Ij ^kInput, its size only needs storage 2nNumber, and can alternately cover use, can be multiplexing for each component at former input a _IjStorer in.Wherein, n represents the dimension of positive definite matrix.Certainly, when all bigger matrix operation of the quantity of carrying out row, column, can adopt the more storer of large storage capacity.

Each component calculation unit 12 is made up of a complex multiplier 121 and a plural totalizer 122, is used to finish the calculating of component, and this component can be that the row component also can be the row component.Calculate for convenience, the present invention describes with the example that is calculated as of row component.Component on the matrix column can carry out complex multiplication operation here earlier, carries out plural accumulation calculating then, obtains the result of calculation shown in (6) formula.

Multiplexer 13 is used to select current good component to come from component calculation unit (1) still (2).The present invention for convenience of description adopts from 32 * 32 the 1st mode that is listed as time selection of the 32nd leu and selects, and promptly at first selects the component result of calculation of the 1st row, selects the result of calculation of 2-the 32nd row component then successively.Multiplexer is used to select component m _Ij ^J-1Come from certain component calculation unit, calculate the element l of matrix L when the prostatitis to provide input to go _JjAnd l _Ij

The component that normalization unit 14 is used for that component calculation unit 12 is calculated carries out the normalization operation, to prevent overflowing of subsequent operation.

Square root reciprocal unit 15 comprises a multiplier 151 and a data processor 152, is used for according to formula (5) and (6) cornerwise element being carried out square root and derivative operation, obtains the square root l of this element _JjAnd square root 1/l reciprocal _Jj, and can be to the square root 1/l reciprocal of this acquisition _JjLock.

Diagonal line multiplication unit 16 is used for the element of same row is carried out multiplying, promptly utilizes the square root reciprocal of the element on the diagonal line in each row, multiply by other normalized calculating components in these row respectively successively.Diagonal line multiplication unit 16 comprises paired multiplier 161 and shift unit 162, for deal with data faster, can adopt parallel many to multiplier and shift unit.Multiplier 161 is used for multiplying, and shift unit 162 is shifted the acquisition result of multiplier 161, obtains this result's decimal form and with its output.

Please refer to Fig. 2, the situation of working simultaneously with u parallel component calculation unit among the figure is the performing step that example illustrates cholesky decomposition algorithm device:

The first step, calculate the 1st result who is listed as:

At first, the component calculation unit 12 that u is parallel obtains the data of matrix A according to the mode of a moment data from external memory storage (figure does not show).The first kind of mode that obtains data is: the data that obtained the 1st row of matrix A by the 1st component calculation unit according to the mode of a moment data from top to bottom.Certainly, also can adopt the parallel data that obtain the 1st row of matrix A of the second way, promptly, the 1st component calculation unit obtains the 1st data of the 1st row of matrix A, while the 2nd component calculation unit obtains the 2nd data of the 1st row of matrix A, i component calculation unit obtains i the data (i=1 of the 1st row of matrix A, 2, ... n, the dimension of positive definite matrix), u component calculation unit obtains u data of the 1st row of matrix A, if u＜n, then repeat aforesaid operations, obtain until the 1st data that list and finish.Here, we are that example describes in first kind of mode.

The 1st moment, promptly obtain matrix A the 1st row the 1st data-matrix A the 1st row diagonal line on element a ₁₁The moment, the 1st component calculation unit carried out complex multiplication and plural accumulating operation according to formula (6) with it, obtains the calculating component of its correspondence, this moment also be α ₁₁According to formula (6), be equal to itself for the operation of data result of the 1st row of matrix A.To calculate the component that obtains by multiplexer 13 and be sent to normalization unit 14, carry out normalized by normalization unit 14, to prevent overflowing of subsequent operation.

Simultaneously, first element of the 1st row component that this computing obtains is sent to square root reciprocal unit 15 by normalization unit 14, carries out square root and asks computing reciprocal, according to the first number l on formula (7) the acquisition matrix L diagonal line ₁₁And 1/l reciprocal ₁₁, l ₁₁Output to the storer (figure does not show) of an outside, simultaneously, square root reciprocal unit 15 locking 1/l ₁₁Wherein, this storer can be the storer of input matrix A, also can be other storer.If the storer of input matrix, then this l ₁₁Can replace the element a among the original matrix A ₁₁Here, for simple description, the data that we will obtain matrix L are input to this other storer, down together.

In (n-1) individual moment subsequently, n element of 2-of the 1st row of matrix A is successively after the 1st component calculator carried out complex multiplication and plural accumulating operation, be sent to diagonal angle multiplier 16 successively through multiplexer, normalization unit 14, simultaneously, the 1/l of square root reciprocal unit 15 lockings ₁₁Multiply each other with n element of 2-successively respectively constantly n of 2-,, can obtain the element of matrix L the 1st row according to formula (8).And constantly obtain the result with this n-1 successively by shift unit 162 and be sent to storer 11 and this other storer simultaneously.

At this moment, the 1st row that in this other storer, obtained matrix L l as a result _I1The 1st row that obtain matrix L in storer 11 are except l ₁₁L as a result _I1

The component of degree n n of second step, compute matrix L and the result of the 2nd row:

Constantly begin constantly from (n+1) up to (2n-1), u component calculation unit 12 parallel from the storer of matrix A reading of data, the mode of its reading of data is: the 1st component calculation unit reads the data of matrix A the 2nd row, and utilizes matrix L the 1st row that obtain l as a result _IlCalculate the one component of degree n n according to formula (6); The 2nd component calculation unit obtains the 2nd data of the 3rd row of matrix A simultaneously, and utilizes the 1st row l as a result _I1Calculate the one component of degree n n according to formula (6), i component calculation unit obtains the data of (i+1) row of matrix A, and matrix L the 1st row that utilize acquisition l as a result _I1Calculate the one component of degree n n according to formula (6); U component calculation unit obtains the data of the u+1 row of matrix A, if u＜n then repeats aforesaid operations, until a component of degree n n m of the data of matrix A _Ij ¹All calculate and finish.

In Fig. 2, because the quantity u of the component calculation unit 12 that adopts is less than the dimension n of matrix, therefore need reuse same component calculation unit 12 when calculating a component of degree n n, that is, component calculation unit u may need the row of the integral multiple of u are carried out component calculating.If the quantity of component calculation unit 12 is greater than or equal to the dimension of matrix, then when calculating a component of degree n n, the component of all row of parallel processing simultaneously calculates, and the perhaps same component that lists all elements calculates.Provide the mode of data relevant with storer 11 this moment, and this can set as required.

For the 2nd row of matrix L, similar with the process of the 1st column data of asking matrix L, its first number (being second number on the matrix L diagonal line) l ₂₂, can be in square root reciprocal unit 15 by one component of degree n n m ₂₂ ¹Calculate by extraction of square root, and other data results l of the 2nd row _I2Can be by l ₂₂And m _I2 ¹Calculate in diagonal angle multiplier 16 according to formula (8).At this moment, the 1st row that obtained matrix L in this other storer l as a result _IlWith the 2nd row l as a result _I2, stored the 1st row of matrix L except l at storer 11 ₁₁L as a result _I1With the 2nd row except l ₂₂L as a result _I2Other each row one component of degree n n m _Ij ¹The result, all will be stored and wait until next step and use, the storer of input matrix A before these quadratic components result can store into by component calculation unit 12, and cover matrix A.

The quadratic component of the 3rd step, compute matrix L and the result of the 3rd row:

Begin constantly constantly from 2n, utilize the l as a result of the 2nd row up to (3n-2) _I2With a component of degree n n m _Ij ¹, since the 3rd row, the quadratic component m of while parallel computation matrix L in u component calculation unit _Ij ², calculate until the quadratic component of all row of matrix L and to finish.For the 3rd row of matrix L, similar with the process of the 2nd column data of asking matrix L, its first number (being the 3rd number on the matrix L diagonal line) l ₃₃, can be in square root reciprocal unit 15 by its quadratic component m ₃₃ ²Calculate by extraction of square root, and other data results l of the 3rd row _I3Can be by l ₃₃And m _I3 ²Calculate according to formula (8).At this moment, the 1st row that obtained matrix L in this other storer l as a result _I1, the 2nd row l as a result _I2With the 3rd row l as a result _I3, stored the 1st row of matrix L except l at storer 11 ₁₁L as a result _I1, the 2nd row except l ₂₂L as a result _I2And the 3rd row except l ₃₃L as a result _I3Use if in storer 11, adopt alternately to cover, then can be by the 3rd row except l ₃₃L as a result _I3Cover the 1st row except l ₁₁L as a result _I1, promptly the row that obtained by this computing cover the row of oldest stored.Each row quadratic component result except that the 3rd row will be stored and wait until next step use, the storer of input matrix A before these quadratic components result can store into by component calculation unit 12, and a component of degree n n of storage before covering.

The 4th goes on foot, repeats above-mentioned process, the k-1 component of degree n n of compute matrix L and k row result.

N-1 component of degree n n and the n row result of the 5th step, compute matrix L.

Calculate the n-1 component of degree n n m of last row _Nn ^N-1, and calculate last number l of matrix L by extraction of square root in square root reciprocal unit 15 _Nn, whole like this matrix L has also just obtained.

Because present embodiment combines the cholesky decomposition and the matrix inversion of joint detection algorithm effectively, for the diagonal line The data of matrix L direct calculating and store its inverse, and computing reciprocal and square root calculation combined effectively, utilize a square root reciprocal unit (comprising a multiplier and a data processor) to adopt real multiplications and look-up table, realize square root function reciprocal, saved the computing reciprocal that once repeats.

Cholesky decomposition algorithm device is characterised in that: the process that will calculate each data of L matrix is divided into a plurality of fractions, and according to the general character that adjacent several column data calculate, the parallel component calculation unit of utilization reaches the purpose of parallel computation.If adopt two parallel component calculation unit, Bing Hang water operation can save for nearly 50% working time like this.Decompose if adopt more component calculation unit that the bigger matrix of dimension is carried out cholesky, can on operation time, also can obtain bigger saving.As the component m of u component calculation unit of utilization to each row _Ij ^kWhen carrying out concurrent operation, can save the working time of u-l/u nearly according to formula (6).

The present invention is only with the example that is applied as in the TD-SCDMA system, but this device also can be used for the realization that the cholesky of other digital information processing systems decomposes.

Claims

1. cholesky decomposition algorithm device, it comprises:

One storer in order to store the matrix data of the particular column that is calculated by this cholesky decomposition algorithm device, is used for the input as subsequent calculations;

At least two component calculation unit, the matrix data of the outside being imported in order to the matrix data that utilizes above-mentioned memory stores carries out component calculating;

One multiplexer, it selects the component result of calculation of output component computing unit;

One normalization unit, its component result of calculation with multiplexer output is carried out normalized;

One square root reciprocal unit, its component with normalized carry out extraction of square root and ask reciprocal and handle, and obtain its square root and square root reciprocal, and lock this inverse square root;

One diagonal angle multiplier, it carries out multiplying with the component of normalized and the square root reciprocal of locking, and exports its result of calculation to this storer.

2. cholesky decomposition algorithm device as claimed in claim 1 is characterized in that, this component calculation unit comprises a complex multiplier and a plural totalizer.

3. cholesky decomposition algorithm device as claimed in claim 2 is characterized in that, this component calculation unit is utilized following formula that matrix data is carried out component and calculated:

m_{ij}^{k} = m_{ij}^{k - 1} - l_{ik} l_{jk}^{*}

(k＝1，2，...，j-1)

Wherein, j=1,2 ..., n,

i＝(j+1)、(j+2)、...、n，

m _Ij ^kInitial value be:

m_{ij}^{1} = a_{ij} - l_{i 1} l_{j 1}^{*},

a _IjBe the element of a n rank positive definite matrix A, l _I1And l _J1 ^*Be respectively the element of pairing lower triangular matrix of these n rank positive definite matrix A and the pairing associate matrix of this lower triangular matrix.

4. cholesky decomposition algorithm device as claimed in claim 1 is characterized in that, what this inverse square root unit locked is the square root reciprocal that is positioned at cornerwise component in each row of matrix.

5. cholesky decomposition algorithm device as claimed in claim 1 is characterized in that, this diagonal angle multiplier comprises a multiplier.

6. cholesky decomposition algorithm device as claimed in claim 5 is characterized in that, this diagonal angle multiplier further comprises a shift unit.