CN103927290A - Inverse operation method for lower triangle complex matrix with any order - Google Patents
Inverse operation method for lower triangle complex matrix with any order Download PDFInfo
- Publication number
- CN103927290A CN103927290A CN201410156677.4A CN201410156677A CN103927290A CN 103927290 A CN103927290 A CN 103927290A CN 201410156677 A CN201410156677 A CN 201410156677A CN 103927290 A CN103927290 A CN 103927290A
- Authority
- CN
- China
- Prior art keywords
- matrix
- complex
- unit
- operation method
- triangle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 124
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000009825 accumulation Methods 0.000 claims abstract description 20
- 230000008901 benefit Effects 0.000 claims abstract description 11
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 abstract description 13
- 238000000354 decomposition reaction Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000017105 transposition Effects 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000004087 circulation Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 229940050561 matrix product Drugs 0.000 description 1
- 230000005039 memory span Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Landscapes
- Complex Calculations (AREA)
Abstract
The invention relates to an inverse operation method for a lower triangle complex matrix with any order. The inverse operation method comprises the following steps that (1) a reciprocal obtaining unit is set, and is used for carrying out reciprocal obtaining operation on a diagonal element of an N-order matrix L, and outputting a matrix obtained after reciprocal obtaining operation is accomplished; (2) a multiplication and accumulation unit is set and is used for receiving the matrix obtained after reciprocal obtaining operation is accomplished, and multiplication and accumulation operation is carried out on the first element to the (i-1)th element in the ith row in the matrix; (3) a reciprocal multiplication obtaining unit is set and is used for receiving the accumulation result corresponding to the elements in the ith row of the matrix, reciprocal obtaining operation is carried out on the accumulation result, and then the accumulation result processed through reciprocal obtaining operation is multiplied by a diagonal element in the ith row so that a matrix element of the ith row of an inverse matrix L-1 can be obtained. In the whole process, a plurality of multiplication and accumulation units are used for carrying out parallel calculation. The inverse operation method for the lower triangle complex matrix with any order has the advantages that the inverse operation of the lower triangle complex matrix with any order can be achieved, and restriction caused by the number of operation units does not exist; only the design of a multiply-accumulator with one plural adder and one plural multiplier is adopted, hardware resources are saved, and operation efficiency is ensured through an effective parallelization mode.
Description
Technical field
The present invention relates to hardware configuration and the implementation method of matrix inversion, relate in particular to a kind of triangle complex matrix inversion operation method under arbitrary order.
Background technology
The method of matrix inversion has a lot, and as the adjoint matrix tactical deployment of troops, elementary transformation method, block matrix method, Gaussian elimination etc., large multi-method has the problems such as computation process complexity, storage resource demands are large, is not suitable for realizing on hardware.What in hardware platform, mainly use at present is the method based on matrix decomposition.Method based on matrix decomposition mainly contains LU decomposition, QR decomposes and Cholesky decomposes three kinds.Although wherein QR decomposition is applied widely, computation process is too complicated, is not suitable for realizing on hardware; Although it is comparatively simple that Cholesky decomposes, and is only applicable to real symmetric positive definite matrix, the scope of application is too little, and extracting operation can consume a lot of hardware resources; The applicable elements that LU decomposes easily meets, and computation complexity is moderate, is applicable to hardware and realizes.And three kinds of decomposition methods all can produce triangular matrix, wherein LU decomposition is a upper triangular matrix and lower triangular matrix by matrix decomposition.Therefore, research triangular matrix is inverted and is had important practical significance.
Realize for the hardware of triangular matrix inversion operation, conventionally only need the exploitation lower triangular matrix module of inverting, for upper triangular matrix, the character of utilizing matrix transpose is by transforming the multiplexing lower triangular matrix module of inverting.It is systolic array architecture that current hardware is realized lower triangular matrix inversion operation the most widely used, its advantage is that degree of parallelism is high, the computing performance period is few, but shortcoming also clearly: very large to the consumption of hardware resource, the Float Point Unit number that it is required and order of matrix are counted N and be chi square function relation; And sequential control is comparatively complicated, makes the data communication between arithmetic element very frequent.Repeatedly improve and optimize although passed through, systolic array architecture is still very complicated, and is difficult to realize high level matrix and inverts.Also can effectively address the above problem without any hardware design at present, therefore, be necessary that the hardware implementation structure that lower triangular matrix is inverted redesigns and optimizes.
Summary of the invention
The object of the invention is to overcome the deficiency of above prior art, and a kind of triangle complex matrix inversion operation method under arbitrary order is provided, and in order to support the matrix inversion algorithm based on decomposing, improves arithmetic speed and saves hardware resource, specifically has following technical scheme to realize:
Under described arbitrary order, triangle complex matrix inversion operation method, comprises the steps:
(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down;
(2) a multiply accumulating unit is set, for the matrix after getting down described in receiving, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2;
(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain the capable matrix element of i of inverse matrix L-1;
(4) make i from adding 1, repeat (2), (3) step, until i=N finally obtains inverse matrix L
-1.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described step 2) in multiply accumulating unit according to formula
, by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein
, S
ijfor inverse matrix L
-1element express.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described multiply accumulating unit comprises
A complex multiplier, for receiving corresponding l
ikand S
kj, complete l
ikwith S
kjcomplex multiplication operation, and export multiplication result;
A complex adder, for receiving described multiplication result and completing complex addition operation, at the described accumulation result of output;
And logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, in described step 2) being M according to degree of parallelism, corresponding storage scheme is set in execution computing, described storage scheme comprises original matrix L and matrix of consequence S is left in respectively in N different continuous storage unit, a described N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, set up memory access mechanism according to described storage scheme, described memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, represent k storage unit, k is the integer that is less than or equal to N, and k=j mod N, left in corresponding k group storage unit, realized parallel memory access to complete the concurrent operation of degree of parallelism as M.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, the described unit of getting down comprises two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier, described two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively, the output terminal of described real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of described real number divider, the output terminal of real number divider is connected with the input end of described complex multiplier, and another input end of described real number divider is set to 1 regularly.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described negate takes advantage of unit to comprise delay logic piece, negate logical operation piece and complex multiplier, and described logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.
Advantage of the present invention is as follows:
(1) the present invention can realize the inversion operation of triangle complex matrix under arbitrary order, as long as memory span is enough large, is not subject to the restriction of arithmetic element quantity;
(2) the multiply accumulating device with pipelining-stage of autonomous Design of the present invention, only adopts a complex adder and a complex multiplier, has saved approximately 15% hardware resource, and has ensured operation efficiency by pipeline computing;
(3) the invention provides a kind of effectively parallelization mode, and degree of parallelism has highly scalable, select neatly degree of parallelism according to hardware resource, storage resources and matrix exponent number, can improve to greatest extent hardware resource utilization and operation efficiency, meet the requirement of high-speed computation.
Brief description of the drawings
Fig. 1 is the matrix inversion algorithm flow chart of the embodiment of the present invention.
Fig. 2 is for getting the structural drawing of falling element circuit.
Fig. 3 is multiply accumulating element circuit structural drawing.
Fig. 4 is that element circuit structural drawing is taken advantage of in negate.
Fig. 5 is the functional simulation figure of the multiply accumulating device of autonomous Design.
Fig. 6 is the scheme key diagram of the embodiment of the present invention four tunnel executed in parallel.
Fig. 7 is the storage rule schematic diagram of the embodiment of the present invention.
Matrix exponent number-speed-up ratio trend map of Tu8Wei tetra-tunnel executed in parallel.
Embodiment
Below in conjunction with accompanying drawing, the present invention program is elaborated.
Under the arbitrary order that the present embodiment provides, triangle complex matrix inversion operation method, comprises the steps:
(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down.
(2) a multiply accumulating unit is set, for receiving the matrix after above-mentioned getting down of mentioning, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2.Multiply accumulating unit is according to formula
, by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein
, S
ijfor inverse matrix L
-1element express.
(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain inverse matrix L
-1the capable matrix element of i.
(4) make i from adding 1, repeat (2), (3) step, until i=N finally obtains inverse matrix L
-1.
Further, corresponding storage scheme is set in the execution computing that the present embodiment is M according to degree of parallelism, and this storage scheme comprises original matrix L and matrix of consequence S are left in respectively in N different continuous storage unit.N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.
Set up memory access mechanism according to above-mentioned storage scheme, this memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, this element is left in corresponding k group storage unit, realize parallel memory access to complete the concurrent operation of degree of parallelism as M.Wherein storage unit k represents k storage unit, and k is the integer that is less than or equal to N, and k=j mod N.
To optimize the inversion operation of 96 rank complex matrix based on LU decomposition as example, by reference to the accompanying drawings the present invention program is elaborated below.
In the present embodiment, use 96 rank single-precision floating point complex matrixs of the random generation of Matlab, and it carried out to LU decomposition operation, by following scheme, L and U matrix are carried out to inversion operation:
Known to N rank nonsingular matrix
lU decompose,
, be that to be decomposed into a principal diagonal be 1 lower triangular matrix entirely
with upper triangular matrix
product, as the formula (1).
(1)
Decompose the lower triangle complex matrix L obtaining for LU, if order
there is following computing formula:
(2)
Consider that L matrix is special unit triangular matrix, therefore can simplify the calculation procedure that relates to diagonal entry;
Decompose the upper triangle complex matrix U obtaining for LU, can utilize
, first its transposition is become lower triangle complex matrix to carry out inversion operation, then transposition obtains the inverse matrix of U.
The present invention does not relate to matrix transpose unit, only the inversion operation of lower triangular matrix is set forth.Therefore,, all think that U matrix is the matrix after transposition.According to above-mentioned formula, by following steps, L and U matrix are carried out to inversion operation:
(1) get and have bad luck calculation
According to formula (2), first need diagonal of a matrix element to get and have bad luck calculation.
For L matrix, need be only 1 by data input pin assignment, because the diagonal entry of L is all 1.
For U matrix, diagonal entry need be got and had bad luck calculation, get the structure of falling element circuit as shown in figure (2), wherein
be operand address generation unit,
corresponding is result address generation unit.Known according to formula (3), the unit of getting down of the present embodiment is mainly made up of two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier.Two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively.The output terminal of real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of real number divider.The output terminal of real number divider is connected with the input end of complex multiplier, and another input end of real number divider is set to 1 regularly.
(2) multiply accumulating computing
According to formula
carry out the calculating of this step, because the calculating of every row element need to be gone up the row element result of inverting, therefore this step once can only be calculated a row element, carries out since the 2nd row, and corresponding hardware is realized and is multiply accumulating device.
Multiply accumulating unit is mainly made up of a complex multiplier, complex adder and parallel logic control module.Complex multiplier, for receiving corresponding l
ikand S
kj, complete l
ikwith S
kjcomplex multiplication operation, and export multiplication result.Complex adder, for receiving described multiplication result and completing complex addition operation, at the described accumulation result of output.Logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function, circuit structure is as shown in Figure 3.Because the complex adder in the present embodiment adopts 4 grades of flowing water, cannot directly realize accumulation function, to this, the logic control element of the present embodiment design is two controllers, be control_add_a and control_add_b module in figure, give certain delay to realize accumulation function to certainly cumulative last four numbers that produce.The time delay bringing in order to hide logic control, improves operation efficiency, adopts pipeline computing.By this multiply accumulating unit is carried out to simple emulation testing, verify its function accuracy, result is as shown in figure (5).As can be seen from the figure, add up for last 2 times and inserted certain time delay by logic control afterwards, to realize the certainly cumulative of complex adder.
(3) negate multiplication
A row element to the each output of step (2) carries out negate multiplication, can obtain the result of inverting of this row element, can calculate for lower row element.The negate of the present embodiment takes advantage of unit to be mainly made up of delay logic piece, negate logical operation piece and complex multiplier.Logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.
For L matrix, because diagonal entry is 1, can omit multiplication, only need be by the sign bit negate of input data;
For U matrix, need will after input data-conversion, be multiplied by again the diagonal entry of corresponding row, this negate takes advantage of the circuit structure of unit as shown in figure (4).
(4) carry out after 95 times through above-mentioned two, three step circulations, can try to achieve L, U inverse of a matrix matrix.Whole algorithm flow chart is referring to Fig. 1.
According to of the present invention, can carry out by adopting one group of above-mentioned multiply accumulating unit to realize multidiameter delay.The present embodiment adopts four tunnel executed in parallel, therefore needs four multiply accumulating unit.Wherein have two groups to be L performance element, other two groups is U performance element, and the degree of parallelism of L, U is respectively 2, and figure (6) is L and the U matrix parallel key diagram that carries into execution a plan, and the element of different pattern represents respectively by tetra-groups of multiply accumulating unit of MAC1-4 and carries out and calculate.And the storage rule that is 2 according to degree of parallelism need be stored in source matrix and matrix of consequence respectively in 4 different storage unit.That adopt due to the present embodiment is the SRAM of 2MB, is divided into 32 bank, and each bank is 32*1024*16bit, need use 3*4 bank and could meet the storage demand to 96 rank matrixes, and the concrete storage scheme of L matrix is as schemed as shown in (7), and U matrix is in like manner stored.
Multiply accumulating computing is the core of whole computing, therefore, its parallelization is equal to the parallelization to whole inversion operation.Because nonzero element number in lower triangular matrix adds one line by line, odd-even alternation, can not reach fully parallelization, the theoretical speed-up ratio that the present embodiment Zhong Si road executed in parallel is carried out with respect to serial can be by formula
obtain, matrix exponent number-speed-up ratio trend is as shown in figure (8).Visible, N is larger for matrix exponent number, and speed-up ratio is larger, and parallel effect is better.In the present embodiment, the theoretical speed-up ratio of 96 rank matrix inversions is reached to 3.979.
The design of the present embodiment is the clock signal based on 1GHz, and four roads walked abreast working times of 96 rank L, U matrix inversion is 112500ns.Calculate according to theory, two-way executed in parallel required time is 203085ns, and four tunnel executed in parallel required times are 108617ns.Visible, the design, under low hardware resource consumption, can complete triangle complex matrix inversion operation efficiently, and parallel efficiency is very high, and four tunnel executed in parallel speed almost reach 2 times that two-way walks abreast.In addition, the divider of using in step (1) in the present embodiment is not supported water operation, and efficiency is lower, has affected to a certain extent inversion operation efficiency, if adopt more efficient divider further to shorten operation time.
Claims (7)
1. a triangle complex matrix inversion operation method under arbitrary order, is characterized in that, comprises the steps:
(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down;
(2) a multiply accumulating unit is set, for the matrix after getting down described in receiving, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2;
(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain inverse matrix L
-1the capable matrix element of i;
(4) make i from adding 1, repeat (2), (3) step, until i=N finally obtains inverse matrix L
-1.
2. triangle complex matrix inversion operation method under arbitrary order according to claim 1, is characterized in that described step 2) in multiply accumulating unit according to formula
, by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein
, S
ijfor inverse matrix L
-1element express.
3. triangle complex matrix inversion operation method under arbitrary order according to claim 2, is characterized in that, described multiply accumulating unit comprises
A complex multiplier, for receiving corresponding l
ikand S
kj, complete l
ikwith S
kjcomplex multiplication operation, and export multiplication result;
A complex adder, for receiving described multiplication result and completing complex addition operation, then exports described accumulation result;
And logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function.
4. triangle complex matrix inversion operation method under arbitrary order according to claim 3, it is characterized in that, in described step 2) being M according to degree of parallelism, corresponding storage scheme is set in execution computing, described storage scheme comprises original matrix L and matrix of consequence S is left in respectively in N different continuous storage unit, a described N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.
5. triangle complex matrix inversion operation method under arbitrary order according to claim 4, it is characterized in that, set up memory access mechanism according to described storage scheme, described memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, currentElement is left in corresponding k group storage unit, realize parallel memory access to complete the concurrent operation of degree of parallelism as M, wherein storage unit k represents k storage unit, k is the integer that is less than or equal to N, and k=j mod N.
6. triangle complex matrix inversion operation method under arbitrary order according to claim 1, it is characterized in that, the described unit of getting down comprises two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier, described two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively, the output terminal of described real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of described real number divider, the output terminal of real number divider is connected with the input end of described complex multiplier, and another input end of described real number divider is set to 1 regularly.
7. triangle complex matrix inversion operation method under arbitrary order according to claim 1, it is characterized in that, described negate takes advantage of unit to comprise delay logic piece, negate logical operation piece and complex multiplier, and described logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410156677.4A CN103927290A (en) | 2014-04-18 | 2014-04-18 | Inverse operation method for lower triangle complex matrix with any order |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410156677.4A CN103927290A (en) | 2014-04-18 | 2014-04-18 | Inverse operation method for lower triangle complex matrix with any order |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103927290A true CN103927290A (en) | 2014-07-16 |
Family
ID=51145513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410156677.4A Pending CN103927290A (en) | 2014-04-18 | 2014-04-18 | Inverse operation method for lower triangle complex matrix with any order |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103927290A (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104360986A (en) * | 2014-11-06 | 2015-02-18 | 江苏中兴微通信息科技有限公司 | Realization method of parallelization matrix inversion hardware device |
CN104536943A (en) * | 2015-01-13 | 2015-04-22 | 江苏中兴微通信息科技有限公司 | Low-division-quantity fixed point implementation method and device for matrix inversion |
CN104794002A (en) * | 2014-12-29 | 2015-07-22 | 南京大学 | Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources |
CN105373517A (en) * | 2015-11-09 | 2016-03-02 | 南京大学 | Spark-based distributed matrix inversion parallel operation method |
CN105426345A (en) * | 2015-12-25 | 2016-03-23 | 南京大学 | Matrix inverse operation method |
CN105701068A (en) * | 2016-02-19 | 2016-06-22 | 南京大学 | Cholesky matrix inversion system based on time division multiplexing technology |
CN107203491A (en) * | 2017-05-19 | 2017-09-26 | 电子科技大学 | A kind of triangle systolic array architecture QR decomposers for FPGA |
CN107341133A (en) * | 2017-06-24 | 2017-11-10 | 中国人民解放军信息工程大学 | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition |
CN107368459A (en) * | 2017-06-24 | 2017-11-21 | 中国人民解放军信息工程大学 | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication |
CN108509386A (en) * | 2018-04-19 | 2018-09-07 | 武汉轻工大学 | The method and apparatus for generating reversible modal m matrix |
CN108536651A (en) * | 2018-04-19 | 2018-09-14 | 武汉轻工大学 | The method and apparatus for generating reversible modal m matrix |
CN108874744A (en) * | 2017-05-08 | 2018-11-23 | 辉达公司 | The broad sense of matrix product accumulating operation accelerates |
CN109558567A (en) * | 2018-11-06 | 2019-04-02 | 海南大学 | The upper triangular portions storage device of self adjoint matrix and parallel read method |
CN109614149A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | The upper triangular portions storage device of symmetrical matrix and parallel read method |
CN109614582A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | The lower triangular portions storage device of self adjoint matrix and parallel read method |
CN109635236A (en) * | 2018-11-06 | 2019-04-16 | 海南大学 | The lower triangular portions storage device of symmetrical matrix and parallel read method |
CN109857982A (en) * | 2018-11-06 | 2019-06-07 | 海南大学 | The triangular portions storage device and parallel read method of symmetrical matrix |
CN111538946A (en) * | 2020-04-24 | 2020-08-14 | 合肥工业大学 | Quick verification system for operation result |
CN112445752A (en) * | 2019-08-28 | 2021-03-05 | 上海华为技术有限公司 | Matrix inversion device based on cholesky decomposition |
CN113608219A (en) * | 2021-07-22 | 2021-11-05 | 北京无线电测量研究所 | System and method for realizing uniform azimuth sampling for multi-channel SAR |
US11488664B2 (en) | 2020-10-13 | 2022-11-01 | International Business Machines Corporation | Distributing device array currents across segment mirrors |
CN116662730A (en) * | 2023-08-02 | 2023-08-29 | 之江实验室 | Cholesky decomposition calculation acceleration system based on FPGA |
US11816482B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
TWI841330B (en) * | 2023-03-31 | 2024-05-01 | 國立臺灣大學 | Gaussian elimination computing system and gaussian elimination computing method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005081443A1 (en) * | 2004-02-25 | 2005-09-01 | Ntt Docomo, Inc. | Apparatus and method for sequence estimation using multiple-input multiple -output filtering |
CN102546088A (en) * | 2010-12-28 | 2012-07-04 | 电子科技大学 | BD (block diagonalization) pre-coding method and device |
-
2014
- 2014-04-18 CN CN201410156677.4A patent/CN103927290A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005081443A1 (en) * | 2004-02-25 | 2005-09-01 | Ntt Docomo, Inc. | Apparatus and method for sequence estimation using multiple-input multiple -output filtering |
CN102546088A (en) * | 2010-12-28 | 2012-07-04 | 电子科技大学 | BD (block diagonalization) pre-coding method and device |
Non-Patent Citations (6)
Title |
---|
彭玲: "一种下三角复矩阵求逆方法的IP设计与实现", 《电子测试》 * |
林皓: "基于FPGA的矩阵运算实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
欧阳联渊: "《计算机数值计算方法》", 31 January 1997 * |
熊洋: "下三角复矩阵求逆的ASIC设计及实现", 《微计算机信息》 * |
罗瑜: "基于FPGA的除法器设计", 《计算机与数字工程》 * |
邵仪: "基于FPGA的矩阵运算固化实现技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104360986A (en) * | 2014-11-06 | 2015-02-18 | 江苏中兴微通信息科技有限公司 | Realization method of parallelization matrix inversion hardware device |
CN104360986B (en) * | 2014-11-06 | 2017-07-25 | 江苏中兴微通信息科技有限公司 | A kind of implementation method of parallelization matrix inversion hardware unit |
CN104794002A (en) * | 2014-12-29 | 2015-07-22 | 南京大学 | Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources |
CN104794002B (en) * | 2014-12-29 | 2019-03-22 | 南京大学 | A kind of multidiameter delay division methods and system |
CN104536943A (en) * | 2015-01-13 | 2015-04-22 | 江苏中兴微通信息科技有限公司 | Low-division-quantity fixed point implementation method and device for matrix inversion |
CN104536943B (en) * | 2015-01-13 | 2017-08-29 | 江苏中兴微通信息科技有限公司 | A kind of matrix inversion fixed-point implementation method and device of low division amount |
CN105373517A (en) * | 2015-11-09 | 2016-03-02 | 南京大学 | Spark-based distributed matrix inversion parallel operation method |
CN105426345A (en) * | 2015-12-25 | 2016-03-23 | 南京大学 | Matrix inverse operation method |
CN105701068A (en) * | 2016-02-19 | 2016-06-22 | 南京大学 | Cholesky matrix inversion system based on time division multiplexing technology |
CN105701068B (en) * | 2016-02-19 | 2018-06-19 | 南京大学 | Cholesky matrix inversion systems based on time-sharing multiplexing technology |
US11816482B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11816481B2 (en) | 2017-05-08 | 2023-11-14 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
CN108874744B (en) * | 2017-05-08 | 2022-06-10 | 辉达公司 | Processor, method and storage medium for performing matrix multiply-and-accumulate operations |
US11797302B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
CN108874744A (en) * | 2017-05-08 | 2018-11-23 | 辉达公司 | The broad sense of matrix product accumulating operation accelerates |
US11797301B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US11797303B2 (en) | 2017-05-08 | 2023-10-24 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
CN107203491A (en) * | 2017-05-19 | 2017-09-26 | 电子科技大学 | A kind of triangle systolic array architecture QR decomposers for FPGA |
CN107368459A (en) * | 2017-06-24 | 2017-11-21 | 中国人民解放军信息工程大学 | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication |
CN107368459B (en) * | 2017-06-24 | 2021-01-22 | 中国人民解放军信息工程大学 | Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication |
CN107341133A (en) * | 2017-06-24 | 2017-11-10 | 中国人民解放军信息工程大学 | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition |
CN108509386A (en) * | 2018-04-19 | 2018-09-07 | 武汉轻工大学 | The method and apparatus for generating reversible modal m matrix |
CN108536651A (en) * | 2018-04-19 | 2018-09-14 | 武汉轻工大学 | The method and apparatus for generating reversible modal m matrix |
CN108509386B (en) * | 2018-04-19 | 2022-04-08 | 武汉轻工大学 | Method and apparatus for generating reversible modulo m matrix |
CN108536651B (en) * | 2018-04-19 | 2022-04-05 | 武汉轻工大学 | Method and apparatus for generating reversible modulo m matrix |
CN109857982B (en) * | 2018-11-06 | 2020-10-02 | 海南大学 | Triangular part storage device of symmetric matrix and parallel reading method |
CN109614149A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | The upper triangular portions storage device of symmetrical matrix and parallel read method |
CN109857982A (en) * | 2018-11-06 | 2019-06-07 | 海南大学 | The triangular portions storage device and parallel read method of symmetrical matrix |
CN109635236A (en) * | 2018-11-06 | 2019-04-16 | 海南大学 | The lower triangular portions storage device of symmetrical matrix and parallel read method |
CN109635236B (en) * | 2018-11-06 | 2020-08-21 | 海南大学 | Lower triangular part storage device of symmetric matrix and parallel reading method |
CN109614582A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | The lower triangular portions storage device of self adjoint matrix and parallel read method |
CN109558567B (en) * | 2018-11-06 | 2020-08-11 | 海南大学 | Upper triangular part storage device of self-conjugate matrix and parallel reading method |
CN109558567A (en) * | 2018-11-06 | 2019-04-02 | 海南大学 | The upper triangular portions storage device of self adjoint matrix and parallel read method |
CN112445752A (en) * | 2019-08-28 | 2021-03-05 | 上海华为技术有限公司 | Matrix inversion device based on cholesky decomposition |
CN112445752B (en) * | 2019-08-28 | 2024-01-05 | 上海华为技术有限公司 | Matrix inversion device based on Qiaohesky decomposition |
CN111538946A (en) * | 2020-04-24 | 2020-08-14 | 合肥工业大学 | Quick verification system for operation result |
US11488664B2 (en) | 2020-10-13 | 2022-11-01 | International Business Machines Corporation | Distributing device array currents across segment mirrors |
CN113608219A (en) * | 2021-07-22 | 2021-11-05 | 北京无线电测量研究所 | System and method for realizing uniform azimuth sampling for multi-channel SAR |
CN113608219B (en) * | 2021-07-22 | 2023-11-17 | 北京无线电测量研究所 | Multichannel SAR-oriented azimuth uniform sampling realization system and method |
TWI841330B (en) * | 2023-03-31 | 2024-05-01 | 國立臺灣大學 | Gaussian elimination computing system and gaussian elimination computing method |
CN116662730B (en) * | 2023-08-02 | 2023-10-20 | 之江实验室 | Cholesky decomposition calculation acceleration system based on FPGA |
CN116662730A (en) * | 2023-08-02 | 2023-08-29 | 之江实验室 | Cholesky decomposition calculation acceleration system based on FPGA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103927290A (en) | Inverse operation method for lower triangle complex matrix with any order | |
CN105426345A (en) | Matrix inverse operation method | |
CN102129420B (en) | FPGA implementation device for solving least square problem based on Cholesky decomposition | |
CN102063411A (en) | FFT/IFFT processor based on 802.11n | |
CN107341133B (en) | Scheduling method of reconfigurable computing structure based on LU decomposition of arbitrary dimension matrix | |
CN103092560B (en) | A kind of low-consumption multiplier based on Bypass technology | |
CN109284824B (en) | Reconfigurable technology-based device for accelerating convolution and pooling operation | |
CN102135951B (en) | FPGA (Field Programmable Gate Array) implementation method based on LS-SVM (Least Squares-Support Vector Machine) algorithm restructured at runtime | |
CN102298570A (en) | Hybrid-radix fast Fourier transform (FFT)/inverse fast Fourier transform (IFFT) implementation device with variable counts and method thereof | |
Kono et al. | Scalability analysis of tightly-coupled FPGA-cluster for lattice boltzmann computation | |
CN103970720A (en) | Embedded reconfigurable system based on large-scale coarse granularity and processing method of system | |
CN110543939A (en) | hardware acceleration implementation framework for convolutional neural network backward training based on FPGA | |
CN101763338A (en) | Mixed base FFT/IFFT realization device with changeable points and method thereof | |
CN109508175A (en) | The FPGA design of pseudorandom number generator based on fractional order chaos and Zu Chongzhi's algorithm | |
CN103984677A (en) | Embedded reconfigurable system based on large-scale coarseness and processing method thereof | |
CN105227259A (en) | A kind of M sequence walks abreast production method and device | |
CN104268124A (en) | FFT (Fast Fourier Transform) implementing device and method | |
Song et al. | A fine-grained parallel EMTP algorithm compatible to graphic processing units | |
CN107368459B (en) | Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication | |
CN115034360A (en) | Processing method and processing device for three-dimensional convolution neural network convolution layer | |
CN111275180B (en) | Convolution operation structure for reducing data migration and power consumption of deep neural network | |
CN106873942B (en) | The method that the MSD multiplication of structure amount computer calculates | |
CN102970545A (en) | Static image compression method based on two-dimensional discrete wavelet transform algorithm | |
CN102004720B (en) | Variable-length fast fourier transform circuit and implementation method | |
CN103902762A (en) | Circuit structure for conducting least square equation solving according to positive definite symmetric matrices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140716 |
|
WD01 | Invention patent application deemed withdrawn after publication |