CN103927290A

CN103927290A - Inverse operation method for lower triangle complex matrix with any order

Info

Publication number: CN103927290A
Application number: CN201410156677.4A
Authority: CN
Inventors: 李丽; 杨丹; 虞潇; 潘红兵; 何书专; 王堃
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2014-04-18
Filing date: 2014-04-18
Publication date: 2014-07-16

Abstract

The invention relates to an inverse operation method for a lower triangle complex matrix with any order. The inverse operation method comprises the following steps that (1) a reciprocal obtaining unit is set, and is used for carrying out reciprocal obtaining operation on a diagonal element of an N-order matrix L, and outputting a matrix obtained after reciprocal obtaining operation is accomplished; (2) a multiplication and accumulation unit is set and is used for receiving the matrix obtained after reciprocal obtaining operation is accomplished, and multiplication and accumulation operation is carried out on the first element to the (i-1)th element in the ith row in the matrix; (3) a reciprocal multiplication obtaining unit is set and is used for receiving the accumulation result corresponding to the elements in the ith row of the matrix, reciprocal obtaining operation is carried out on the accumulation result, and then the accumulation result processed through reciprocal obtaining operation is multiplied by a diagonal element in the ith row so that a matrix element of the ith row of an inverse matrix L-1 can be obtained. In the whole process, a plurality of multiplication and accumulation units are used for carrying out parallel calculation. The inverse operation method for the lower triangle complex matrix with any order has the advantages that the inverse operation of the lower triangle complex matrix with any order can be achieved, and restriction caused by the number of operation units does not exist; only the design of a multiply-accumulator with one plural adder and one plural multiplier is adopted, hardware resources are saved, and operation efficiency is ensured through an effective parallelization mode.

Description

A kind of triangle complex matrix inversion operation method under arbitrary order

Technical field

The present invention relates to hardware configuration and the implementation method of matrix inversion, relate in particular to a kind of triangle complex matrix inversion operation method under arbitrary order.

Background technology

The method of matrix inversion has a lot, and as the adjoint matrix tactical deployment of troops, elementary transformation method, block matrix method, Gaussian elimination etc., large multi-method has the problems such as computation process complexity, storage resource demands are large, is not suitable for realizing on hardware.What in hardware platform, mainly use at present is the method based on matrix decomposition.Method based on matrix decomposition mainly contains LU decomposition, QR decomposes and Cholesky decomposes three kinds.Although wherein QR decomposition is applied widely, computation process is too complicated, is not suitable for realizing on hardware; Although it is comparatively simple that Cholesky decomposes, and is only applicable to real symmetric positive definite matrix, the scope of application is too little, and extracting operation can consume a lot of hardware resources; The applicable elements that LU decomposes easily meets, and computation complexity is moderate, is applicable to hardware and realizes.And three kinds of decomposition methods all can produce triangular matrix, wherein LU decomposition is a upper triangular matrix and lower triangular matrix by matrix decomposition.Therefore, research triangular matrix is inverted and is had important practical significance.

Realize for the hardware of triangular matrix inversion operation, conventionally only need the exploitation lower triangular matrix module of inverting, for upper triangular matrix, the character of utilizing matrix transpose is by transforming the multiplexing lower triangular matrix module of inverting.It is systolic array architecture that current hardware is realized lower triangular matrix inversion operation the most widely used, its advantage is that degree of parallelism is high, the computing performance period is few, but shortcoming also clearly: very large to the consumption of hardware resource, the Float Point Unit number that it is required and order of matrix are counted N and be chi square function relation; And sequential control is comparatively complicated, makes the data communication between arithmetic element very frequent.Repeatedly improve and optimize although passed through, systolic array architecture is still very complicated, and is difficult to realize high level matrix and inverts.Also can effectively address the above problem without any hardware design at present, therefore, be necessary that the hardware implementation structure that lower triangular matrix is inverted redesigns and optimizes.

Summary of the invention

The object of the invention is to overcome the deficiency of above prior art, and a kind of triangle complex matrix inversion operation method under arbitrary order is provided, and in order to support the matrix inversion algorithm based on decomposing, improves arithmetic speed and saves hardware resource, specifically has following technical scheme to realize:

Under described arbitrary order, triangle complex matrix inversion operation method, comprises the steps:

(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down;

(2) a multiply accumulating unit is set, for the matrix after getting down described in receiving, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2;

(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain the capable matrix element of i of inverse matrix L-1;

(4) make i from adding 1, repeat (2), (3) step, until i=N finally obtains inverse matrix L ^-1.

Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described step 2) in multiply accumulating unit according to formula , by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein , S _ijfor inverse matrix L ^-1element express.

Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described multiply accumulating unit comprises

A complex multiplier, for receiving corresponding l _ikand S _kj, complete l _ikwith S _kjcomplex multiplication operation, and export multiplication result;

A complex adder, for receiving described multiplication result and completing complex addition operation, at the described accumulation result of output;

And logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function.

Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, in described step 2) being M according to degree of parallelism, corresponding storage scheme is set in execution computing, described storage scheme comprises original matrix L and matrix of consequence S is left in respectively in N different continuous storage unit, a described N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.

Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, set up memory access mechanism according to described storage scheme, described memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, represent k storage unit, k is the integer that is less than or equal to N, and k=j mod N, left in corresponding k group storage unit, realized parallel memory access to complete the concurrent operation of degree of parallelism as M.

Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, the described unit of getting down comprises two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier, described two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively, the output terminal of described real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of described real number divider, the output terminal of real number divider is connected with the input end of described complex multiplier, and another input end of described real number divider is set to 1 regularly.

Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described negate takes advantage of unit to comprise delay logic piece, negate logical operation piece and complex multiplier, and described logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.

Advantage of the present invention is as follows:

(1) the present invention can realize the inversion operation of triangle complex matrix under arbitrary order, as long as memory span is enough large, is not subject to the restriction of arithmetic element quantity;

(2) the multiply accumulating device with pipelining-stage of autonomous Design of the present invention, only adopts a complex adder and a complex multiplier, has saved approximately 15% hardware resource, and has ensured operation efficiency by pipeline computing;

(3) the invention provides a kind of effectively parallelization mode, and degree of parallelism has highly scalable, select neatly degree of parallelism according to hardware resource, storage resources and matrix exponent number, can improve to greatest extent hardware resource utilization and operation efficiency, meet the requirement of high-speed computation.

Brief description of the drawings

Fig. 1 is the matrix inversion algorithm flow chart of the embodiment of the present invention.

Fig. 2 is for getting the structural drawing of falling element circuit.

Fig. 3 is multiply accumulating element circuit structural drawing.

Fig. 4 is that element circuit structural drawing is taken advantage of in negate.

Fig. 5 is the functional simulation figure of the multiply accumulating device of autonomous Design.

Fig. 6 is the scheme key diagram of the embodiment of the present invention four tunnel executed in parallel.

Fig. 7 is the storage rule schematic diagram of the embodiment of the present invention.

Matrix exponent number-speed-up ratio trend map of Tu8Wei tetra-tunnel executed in parallel.

Embodiment

Below in conjunction with accompanying drawing, the present invention program is elaborated.

Under the arbitrary order that the present embodiment provides, triangle complex matrix inversion operation method, comprises the steps:

(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down.

(2) a multiply accumulating unit is set, for receiving the matrix after above-mentioned getting down of mentioning, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2.Multiply accumulating unit is according to formula , by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein , S _ijfor inverse matrix L ^-1element express.

(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain inverse matrix L ^-1the capable matrix element of i.

Further, corresponding storage scheme is set in the execution computing that the present embodiment is M according to degree of parallelism, and this storage scheme comprises original matrix L and matrix of consequence S are left in respectively in N different continuous storage unit.N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.

Set up memory access mechanism according to above-mentioned storage scheme, this memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, this element is left in corresponding k group storage unit, realize parallel memory access to complete the concurrent operation of degree of parallelism as M.Wherein storage unit k represents k storage unit, and k is the integer that is less than or equal to N, and k=j mod N.

To optimize the inversion operation of 96 rank complex matrix based on LU decomposition as example, by reference to the accompanying drawings the present invention program is elaborated below.

In the present embodiment, use 96 rank single-precision floating point complex matrixs of the random generation of Matlab, and it carried out to LU decomposition operation, by following scheme, L and U matrix are carried out to inversion operation:

Known to N rank nonsingular matrix lU decompose, , be that to be decomposed into a principal diagonal be 1 lower triangular matrix entirely with upper triangular matrix product, as the formula (1).

（1）

Decompose the lower triangle complex matrix L obtaining for LU, if order there is following computing formula:

（2）

Consider that L matrix is special unit triangular matrix, therefore can simplify the calculation procedure that relates to diagonal entry;

Decompose the upper triangle complex matrix U obtaining for LU, can utilize , first its transposition is become lower triangle complex matrix to carry out inversion operation, then transposition obtains the inverse matrix of U.

The present invention does not relate to matrix transpose unit, only the inversion operation of lower triangular matrix is set forth.Therefore,, all think that U matrix is the matrix after transposition.According to above-mentioned formula, by following steps, L and U matrix are carried out to inversion operation:

(1) get and have bad luck calculation

According to formula (2), first need diagonal of a matrix element to get and have bad luck calculation.

For L matrix, need be only 1 by data input pin assignment, because the diagonal entry of L is all 1.

For U matrix, diagonal entry need be got and had bad luck calculation, get the structure of falling element circuit as shown in figure (2), wherein be operand address generation unit, corresponding is result address generation unit.Known according to formula (3), the unit of getting down of the present embodiment is mainly made up of two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier.Two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively.The output terminal of real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of real number divider.The output terminal of real number divider is connected with the input end of complex multiplier, and another input end of real number divider is set to 1 regularly.

(2) multiply accumulating computing

According to formula carry out the calculating of this step, because the calculating of every row element need to be gone up the row element result of inverting, therefore this step once can only be calculated a row element, carries out since the 2nd row, and corresponding hardware is realized and is multiply accumulating device.

Multiply accumulating unit is mainly made up of a complex multiplier, complex adder and parallel logic control module.Complex multiplier, for receiving corresponding l _ikand S _kj, complete l _ikwith S _kjcomplex multiplication operation, and export multiplication result.Complex adder, for receiving described multiplication result and completing complex addition operation, at the described accumulation result of output.Logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function, circuit structure is as shown in Figure 3.Because the complex adder in the present embodiment adopts 4 grades of flowing water, cannot directly realize accumulation function, to this, the logic control element of the present embodiment design is two controllers, be control_add_a and control_add_b module in figure, give certain delay to realize accumulation function to certainly cumulative last four numbers that produce.The time delay bringing in order to hide logic control, improves operation efficiency, adopts pipeline computing.By this multiply accumulating unit is carried out to simple emulation testing, verify its function accuracy, result is as shown in figure (5).As can be seen from the figure, add up for last 2 times and inserted certain time delay by logic control afterwards, to realize the certainly cumulative of complex adder.

(3) negate multiplication

A row element to the each output of step (2) carries out negate multiplication, can obtain the result of inverting of this row element, can calculate for lower row element.The negate of the present embodiment takes advantage of unit to be mainly made up of delay logic piece, negate logical operation piece and complex multiplier.Logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.

For L matrix, because diagonal entry is 1, can omit multiplication, only need be by the sign bit negate of input data;

For U matrix, need will after input data-conversion, be multiplied by again the diagonal entry of corresponding row, this negate takes advantage of the circuit structure of unit as shown in figure (4).

(4) carry out after 95 times through above-mentioned two, three step circulations, can try to achieve L, U inverse of a matrix matrix.Whole algorithm flow chart is referring to Fig. 1.

According to of the present invention, can carry out by adopting one group of above-mentioned multiply accumulating unit to realize multidiameter delay.The present embodiment adopts four tunnel executed in parallel, therefore needs four multiply accumulating unit.Wherein have two groups to be L performance element, other two groups is U performance element, and the degree of parallelism of L, U is respectively 2, and figure (6) is L and the U matrix parallel key diagram that carries into execution a plan, and the element of different pattern represents respectively by tetra-groups of multiply accumulating unit of MAC1-4 and carries out and calculate.And the storage rule that is 2 according to degree of parallelism need be stored in source matrix and matrix of consequence respectively in 4 different storage unit.That adopt due to the present embodiment is the SRAM of 2MB, is divided into 32 bank, and each bank is 32*1024*16bit, need use 3*4 bank and could meet the storage demand to 96 rank matrixes, and the concrete storage scheme of L matrix is as schemed as shown in (7), and U matrix is in like manner stored.

Multiply accumulating computing is the core of whole computing, therefore, its parallelization is equal to the parallelization to whole inversion operation.Because nonzero element number in lower triangular matrix adds one line by line, odd-even alternation, can not reach fully parallelization, the theoretical speed-up ratio that the present embodiment Zhong Si road executed in parallel is carried out with respect to serial can be by formula obtain, matrix exponent number-speed-up ratio trend is as shown in figure (8).Visible, N is larger for matrix exponent number, and speed-up ratio is larger, and parallel effect is better.In the present embodiment, the theoretical speed-up ratio of 96 rank matrix inversions is reached to 3.979.

The design of the present embodiment is the clock signal based on 1GHz, and four roads walked abreast working times of 96 rank L, U matrix inversion is 112500ns.Calculate according to theory, two-way executed in parallel required time is 203085ns, and four tunnel executed in parallel required times are 108617ns.Visible, the design, under low hardware resource consumption, can complete triangle complex matrix inversion operation efficiently, and parallel efficiency is very high, and four tunnel executed in parallel speed almost reach 2 times that two-way walks abreast.In addition, the divider of using in step (1) in the present embodiment is not supported water operation, and efficiency is lower, has affected to a certain extent inversion operation efficiency, if adopt more efficient divider further to shorten operation time.

Claims

1. a triangle complex matrix inversion operation method under arbitrary order, is characterized in that, comprises the steps:

(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain inverse matrix L ^-1the capable matrix element of i;

2. triangle complex matrix inversion operation method under arbitrary order according to claim 1, is characterized in that described step 2) in multiply accumulating unit according to formula , by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein , S _ijfor inverse matrix L ^-1element express.

3. triangle complex matrix inversion operation method under arbitrary order according to claim 2, is characterized in that, described multiply accumulating unit comprises

A complex adder, for receiving described multiplication result and completing complex addition operation, then exports described accumulation result;

4. triangle complex matrix inversion operation method under arbitrary order according to claim 3, it is characterized in that, in described step 2) being M according to degree of parallelism, corresponding storage scheme is set in execution computing, described storage scheme comprises original matrix L and matrix of consequence S is left in respectively in N different continuous storage unit, a described N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.

5. triangle complex matrix inversion operation method under arbitrary order according to claim 4, it is characterized in that, set up memory access mechanism according to described storage scheme, described memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, currentElement is left in corresponding k group storage unit, realize parallel memory access to complete the concurrent operation of degree of parallelism as M, wherein storage unit k represents k storage unit, k is the integer that is less than or equal to N, and k=j mod N.

6. triangle complex matrix inversion operation method under arbitrary order according to claim 1, it is characterized in that, the described unit of getting down comprises two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier, described two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively, the output terminal of described real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of described real number divider, the output terminal of real number divider is connected with the input end of described complex multiplier, and another input end of described real number divider is set to 1 regularly.

7. triangle complex matrix inversion operation method under arbitrary order according to claim 1, it is characterized in that, described negate takes advantage of unit to comprise delay logic piece, negate logical operation piece and complex multiplier, and described logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.