CN103927290A - Inverse operation method for lower triangle complex matrix with any order - Google Patents

Inverse operation method for lower triangle complex matrix with any order Download PDF

Info

Publication number
CN103927290A
CN103927290A CN201410156677.4A CN201410156677A CN103927290A CN 103927290 A CN103927290 A CN 103927290A CN 201410156677 A CN201410156677 A CN 201410156677A CN 103927290 A CN103927290 A CN 103927290A
Authority
CN
China
Prior art keywords
matrix
complex
unit
operation method
triangle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410156677.4A
Other languages
Chinese (zh)
Inventor
李丽
杨丹
虞潇
潘红兵
何书专
王堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201410156677.4A priority Critical patent/CN103927290A/en
Publication of CN103927290A publication Critical patent/CN103927290A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention relates to an inverse operation method for a lower triangle complex matrix with any order. The inverse operation method comprises the following steps that (1) a reciprocal obtaining unit is set, and is used for carrying out reciprocal obtaining operation on a diagonal element of an N-order matrix L, and outputting a matrix obtained after reciprocal obtaining operation is accomplished; (2) a multiplication and accumulation unit is set and is used for receiving the matrix obtained after reciprocal obtaining operation is accomplished, and multiplication and accumulation operation is carried out on the first element to the (i-1)th element in the ith row in the matrix; (3) a reciprocal multiplication obtaining unit is set and is used for receiving the accumulation result corresponding to the elements in the ith row of the matrix, reciprocal obtaining operation is carried out on the accumulation result, and then the accumulation result processed through reciprocal obtaining operation is multiplied by a diagonal element in the ith row so that a matrix element of the ith row of an inverse matrix L-1 can be obtained. In the whole process, a plurality of multiplication and accumulation units are used for carrying out parallel calculation. The inverse operation method for the lower triangle complex matrix with any order has the advantages that the inverse operation of the lower triangle complex matrix with any order can be achieved, and restriction caused by the number of operation units does not exist; only the design of a multiply-accumulator with one plural adder and one plural multiplier is adopted, hardware resources are saved, and operation efficiency is ensured through an effective parallelization mode.

Description

A kind of triangle complex matrix inversion operation method under arbitrary order
Technical field
The present invention relates to hardware configuration and the implementation method of matrix inversion, relate in particular to a kind of triangle complex matrix inversion operation method under arbitrary order.
Background technology
The method of matrix inversion has a lot, and as the adjoint matrix tactical deployment of troops, elementary transformation method, block matrix method, Gaussian elimination etc., large multi-method has the problems such as computation process complexity, storage resource demands are large, is not suitable for realizing on hardware.What in hardware platform, mainly use at present is the method based on matrix decomposition.Method based on matrix decomposition mainly contains LU decomposition, QR decomposes and Cholesky decomposes three kinds.Although wherein QR decomposition is applied widely, computation process is too complicated, is not suitable for realizing on hardware; Although it is comparatively simple that Cholesky decomposes, and is only applicable to real symmetric positive definite matrix, the scope of application is too little, and extracting operation can consume a lot of hardware resources; The applicable elements that LU decomposes easily meets, and computation complexity is moderate, is applicable to hardware and realizes.And three kinds of decomposition methods all can produce triangular matrix, wherein LU decomposition is a upper triangular matrix and lower triangular matrix by matrix decomposition.Therefore, research triangular matrix is inverted and is had important practical significance.
Realize for the hardware of triangular matrix inversion operation, conventionally only need the exploitation lower triangular matrix module of inverting, for upper triangular matrix, the character of utilizing matrix transpose is by transforming the multiplexing lower triangular matrix module of inverting.It is systolic array architecture that current hardware is realized lower triangular matrix inversion operation the most widely used, its advantage is that degree of parallelism is high, the computing performance period is few, but shortcoming also clearly: very large to the consumption of hardware resource, the Float Point Unit number that it is required and order of matrix are counted N and be chi square function relation; And sequential control is comparatively complicated, makes the data communication between arithmetic element very frequent.Repeatedly improve and optimize although passed through, systolic array architecture is still very complicated, and is difficult to realize high level matrix and inverts.Also can effectively address the above problem without any hardware design at present, therefore, be necessary that the hardware implementation structure that lower triangular matrix is inverted redesigns and optimizes.
Summary of the invention
The object of the invention is to overcome the deficiency of above prior art, and a kind of triangle complex matrix inversion operation method under arbitrary order is provided, and in order to support the matrix inversion algorithm based on decomposing, improves arithmetic speed and saves hardware resource, specifically has following technical scheme to realize:
Under described arbitrary order, triangle complex matrix inversion operation method, comprises the steps:
(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down;
(2) a multiply accumulating unit is set, for the matrix after getting down described in receiving, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2;
(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain the capable matrix element of i of inverse matrix L-1;
(4) make i from adding 1, repeat (2), (3) step, until i=N finally obtains inverse matrix L -1.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described step 2) in multiply accumulating unit according to formula , by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein , S ijfor inverse matrix L -1element express.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described multiply accumulating unit comprises
A complex multiplier, for receiving corresponding l ikand S kj, complete l ikwith S kjcomplex multiplication operation, and export multiplication result;
A complex adder, for receiving described multiplication result and completing complex addition operation, at the described accumulation result of output;
And logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, in described step 2) being M according to degree of parallelism, corresponding storage scheme is set in execution computing, described storage scheme comprises original matrix L and matrix of consequence S is left in respectively in N different continuous storage unit, a described N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, set up memory access mechanism according to described storage scheme, described memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, represent k storage unit, k is the integer that is less than or equal to N, and k=j mod N, left in corresponding k group storage unit, realized parallel memory access to complete the concurrent operation of degree of parallelism as M.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, the described unit of getting down comprises two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier, described two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively, the output terminal of described real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of described real number divider, the output terminal of real number divider is connected with the input end of described complex multiplier, and another input end of described real number divider is set to 1 regularly.
Under described arbitrary order, the further design of triangle complex matrix inversion operation method is, described negate takes advantage of unit to comprise delay logic piece, negate logical operation piece and complex multiplier, and described logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.
Advantage of the present invention is as follows:
(1) the present invention can realize the inversion operation of triangle complex matrix under arbitrary order, as long as memory span is enough large, is not subject to the restriction of arithmetic element quantity;
(2) the multiply accumulating device with pipelining-stage of autonomous Design of the present invention, only adopts a complex adder and a complex multiplier, has saved approximately 15% hardware resource, and has ensured operation efficiency by pipeline computing;
(3) the invention provides a kind of effectively parallelization mode, and degree of parallelism has highly scalable, select neatly degree of parallelism according to hardware resource, storage resources and matrix exponent number, can improve to greatest extent hardware resource utilization and operation efficiency, meet the requirement of high-speed computation.
Brief description of the drawings
Fig. 1 is the matrix inversion algorithm flow chart of the embodiment of the present invention.
Fig. 2 is for getting the structural drawing of falling element circuit.
Fig. 3 is multiply accumulating element circuit structural drawing.
Fig. 4 is that element circuit structural drawing is taken advantage of in negate.
Fig. 5 is the functional simulation figure of the multiply accumulating device of autonomous Design.
Fig. 6 is the scheme key diagram of the embodiment of the present invention four tunnel executed in parallel.
Fig. 7 is the storage rule schematic diagram of the embodiment of the present invention.
Matrix exponent number-speed-up ratio trend map of Tu8Wei tetra-tunnel executed in parallel.
Embodiment
Below in conjunction with accompanying drawing, the present invention program is elaborated.
Under the arbitrary order that the present embodiment provides, triangle complex matrix inversion operation method, comprises the steps:
(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down.
(2) a multiply accumulating unit is set, for receiving the matrix after above-mentioned getting down of mentioning, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2.Multiply accumulating unit is according to formula , by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein , S ijfor inverse matrix L -1element express.
(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain inverse matrix L -1the capable matrix element of i.
(4) make i from adding 1, repeat (2), (3) step, until i=N finally obtains inverse matrix L -1.
Further, corresponding storage scheme is set in the execution computing that the present embodiment is M according to degree of parallelism, and this storage scheme comprises original matrix L and matrix of consequence S are left in respectively in N different continuous storage unit.N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.
Set up memory access mechanism according to above-mentioned storage scheme, this memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, this element is left in corresponding k group storage unit, realize parallel memory access to complete the concurrent operation of degree of parallelism as M.Wherein storage unit k represents k storage unit, and k is the integer that is less than or equal to N, and k=j mod N.
To optimize the inversion operation of 96 rank complex matrix based on LU decomposition as example, by reference to the accompanying drawings the present invention program is elaborated below.
In the present embodiment, use 96 rank single-precision floating point complex matrixs of the random generation of Matlab, and it carried out to LU decomposition operation, by following scheme, L and U matrix are carried out to inversion operation:
Known to N rank nonsingular matrix lU decompose, , be that to be decomposed into a principal diagonal be 1 lower triangular matrix entirely with upper triangular matrix product, as the formula (1).
(1)
Decompose the lower triangle complex matrix L obtaining for LU, if order there is following computing formula:
(2)
Consider that L matrix is special unit triangular matrix, therefore can simplify the calculation procedure that relates to diagonal entry;
Decompose the upper triangle complex matrix U obtaining for LU, can utilize , first its transposition is become lower triangle complex matrix to carry out inversion operation, then transposition obtains the inverse matrix of U.
The present invention does not relate to matrix transpose unit, only the inversion operation of lower triangular matrix is set forth.Therefore,, all think that U matrix is the matrix after transposition.According to above-mentioned formula, by following steps, L and U matrix are carried out to inversion operation:
(1) get and have bad luck calculation
According to formula (2), first need diagonal of a matrix element to get and have bad luck calculation.
For L matrix, need be only 1 by data input pin assignment, because the diagonal entry of L is all 1.
For U matrix, diagonal entry need be got and had bad luck calculation, get the structure of falling element circuit as shown in figure (2), wherein be operand address generation unit, corresponding is result address generation unit.Known according to formula (3), the unit of getting down of the present embodiment is mainly made up of two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier.Two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively.The output terminal of real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of real number divider.The output terminal of real number divider is connected with the input end of complex multiplier, and another input end of real number divider is set to 1 regularly.
(2) multiply accumulating computing
According to formula carry out the calculating of this step, because the calculating of every row element need to be gone up the row element result of inverting, therefore this step once can only be calculated a row element, carries out since the 2nd row, and corresponding hardware is realized and is multiply accumulating device.
Multiply accumulating unit is mainly made up of a complex multiplier, complex adder and parallel logic control module.Complex multiplier, for receiving corresponding l ikand S kj, complete l ikwith S kjcomplex multiplication operation, and export multiplication result.Complex adder, for receiving described multiplication result and completing complex addition operation, at the described accumulation result of output.Logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function, circuit structure is as shown in Figure 3.Because the complex adder in the present embodiment adopts 4 grades of flowing water, cannot directly realize accumulation function, to this, the logic control element of the present embodiment design is two controllers, be control_add_a and control_add_b module in figure, give certain delay to realize accumulation function to certainly cumulative last four numbers that produce.The time delay bringing in order to hide logic control, improves operation efficiency, adopts pipeline computing.By this multiply accumulating unit is carried out to simple emulation testing, verify its function accuracy, result is as shown in figure (5).As can be seen from the figure, add up for last 2 times and inserted certain time delay by logic control afterwards, to realize the certainly cumulative of complex adder.
(3) negate multiplication
A row element to the each output of step (2) carries out negate multiplication, can obtain the result of inverting of this row element, can calculate for lower row element.The negate of the present embodiment takes advantage of unit to be mainly made up of delay logic piece, negate logical operation piece and complex multiplier.Logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.
For L matrix, because diagonal entry is 1, can omit multiplication, only need be by the sign bit negate of input data;
For U matrix, need will after input data-conversion, be multiplied by again the diagonal entry of corresponding row, this negate takes advantage of the circuit structure of unit as shown in figure (4).
(4) carry out after 95 times through above-mentioned two, three step circulations, can try to achieve L, U inverse of a matrix matrix.Whole algorithm flow chart is referring to Fig. 1.
According to of the present invention, can carry out by adopting one group of above-mentioned multiply accumulating unit to realize multidiameter delay.The present embodiment adopts four tunnel executed in parallel, therefore needs four multiply accumulating unit.Wherein have two groups to be L performance element, other two groups is U performance element, and the degree of parallelism of L, U is respectively 2, and figure (6) is L and the U matrix parallel key diagram that carries into execution a plan, and the element of different pattern represents respectively by tetra-groups of multiply accumulating unit of MAC1-4 and carries out and calculate.And the storage rule that is 2 according to degree of parallelism need be stored in source matrix and matrix of consequence respectively in 4 different storage unit.That adopt due to the present embodiment is the SRAM of 2MB, is divided into 32 bank, and each bank is 32*1024*16bit, need use 3*4 bank and could meet the storage demand to 96 rank matrixes, and the concrete storage scheme of L matrix is as schemed as shown in (7), and U matrix is in like manner stored.
Multiply accumulating computing is the core of whole computing, therefore, its parallelization is equal to the parallelization to whole inversion operation.Because nonzero element number in lower triangular matrix adds one line by line, odd-even alternation, can not reach fully parallelization, the theoretical speed-up ratio that the present embodiment Zhong Si road executed in parallel is carried out with respect to serial can be by formula obtain, matrix exponent number-speed-up ratio trend is as shown in figure (8).Visible, N is larger for matrix exponent number, and speed-up ratio is larger, and parallel effect is better.In the present embodiment, the theoretical speed-up ratio of 96 rank matrix inversions is reached to 3.979.
The design of the present embodiment is the clock signal based on 1GHz, and four roads walked abreast working times of 96 rank L, U matrix inversion is 112500ns.Calculate according to theory, two-way executed in parallel required time is 203085ns, and four tunnel executed in parallel required times are 108617ns.Visible, the design, under low hardware resource consumption, can complete triangle complex matrix inversion operation efficiently, and parallel efficiency is very high, and four tunnel executed in parallel speed almost reach 2 times that two-way walks abreast.In addition, the divider of using in step (1) in the present embodiment is not supported water operation, and efficiency is lower, has affected to a certain extent inversion operation efficiency, if adopt more efficient divider further to shorten operation time.

Claims (7)

1. a triangle complex matrix inversion operation method under arbitrary order, is characterized in that, comprises the steps:
(1) arrange one and get down unit, for the diagonal entry of N rank matrix L is got and had bad luck calculation, and export the matrix after getting down;
(2) a multiply accumulating unit is set, for the matrix after getting down described in receiving, capable to matrix i in before i-1 element carry out multiply accumulating computing and export the accumulation result that i is capable, wherein, i is more than or equal to 2 integer, and the initial value of i is 2;
(3) negate is set and takes advantage of unit, for receiving the described accumulation result corresponding to i row matrix element, carry out after negate computing again the diagonal entry capable with i and multiply each other, obtain inverse matrix L -1the capable matrix element of i;
(4) make i from adding 1, repeat (2), (3) step, until i=N finally obtains inverse matrix L -1.
2. triangle complex matrix inversion operation method under arbitrary order according to claim 1, is characterized in that described step 2) in multiply accumulating unit according to formula , by the mode of calculating by row, in the computation process of every a line, the element of different lines is adopted to the concurrent operation that degree of parallelism is M, wherein , S ijfor inverse matrix L -1element express.
3. triangle complex matrix inversion operation method under arbitrary order according to claim 2, is characterized in that, described multiply accumulating unit comprises
A complex multiplier, for receiving corresponding l ikand S kj, complete l ikwith S kjcomplex multiplication operation, and export multiplication result;
A complex adder, for receiving described multiplication result and completing complex addition operation, then exports described accumulation result;
And logic control element, respectively with the communication connection of described complex multiplier and complex adder, for realize with the complex adder of pipelining-stage from accumulation function.
4. triangle complex matrix inversion operation method under arbitrary order according to claim 3, it is characterized in that, in described step 2) being M according to degree of parallelism, corresponding storage scheme is set in execution computing, described storage scheme comprises original matrix L and matrix of consequence S is left in respectively in N different continuous storage unit, a described N storage unit arranges by corresponding address sequencing, and wherein N is the integer that is more than or equal to 2M.
5. triangle complex matrix inversion operation method under arbitrary order according to claim 4, it is characterized in that, set up memory access mechanism according to described storage scheme, described memory access mechanism comprises for the each element in matrix, count j and storage unit sum N according to row subscript corresponding to currentElement and set a storage unit k, currentElement is left in corresponding k group storage unit, realize parallel memory access to complete the concurrent operation of degree of parallelism as M, wherein storage unit k represents k storage unit, k is the integer that is less than or equal to N, and k=j mod N.
6. triangle complex matrix inversion operation method under arbitrary order according to claim 1, it is characterized in that, the described unit of getting down comprises two real multipliers, a real add musical instruments used in a Buddhist or Taoist mass, a real number divider and a complex multiplier, described two real multipliers are connected with the input end of real add musical instruments used in a Buddhist or Taoist mass respectively, the output terminal of described real add musical instruments used in a Buddhist or Taoist mass is connected with the input end of described real number divider, the output terminal of real number divider is connected with the input end of described complex multiplier, and another input end of described real number divider is set to 1 regularly.
7. triangle complex matrix inversion operation method under arbitrary order according to claim 1, it is characterized in that, described negate takes advantage of unit to comprise delay logic piece, negate logical operation piece and complex multiplier, and described logical block, negate logical operation piece are connected with the input end of complex multiplier respectively.
CN201410156677.4A 2014-04-18 2014-04-18 Inverse operation method for lower triangle complex matrix with any order Pending CN103927290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410156677.4A CN103927290A (en) 2014-04-18 2014-04-18 Inverse operation method for lower triangle complex matrix with any order

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410156677.4A CN103927290A (en) 2014-04-18 2014-04-18 Inverse operation method for lower triangle complex matrix with any order

Publications (1)

Publication Number Publication Date
CN103927290A true CN103927290A (en) 2014-07-16

Family

ID=51145513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410156677.4A Pending CN103927290A (en) 2014-04-18 2014-04-18 Inverse operation method for lower triangle complex matrix with any order

Country Status (1)

Country Link
CN (1) CN103927290A (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104360986A (en) * 2014-11-06 2015-02-18 江苏中兴微通信息科技有限公司 Realization method of parallelization matrix inversion hardware device
CN104536943A (en) * 2015-01-13 2015-04-22 江苏中兴微通信息科技有限公司 Low-division-quantity fixed point implementation method and device for matrix inversion
CN104794002A (en) * 2014-12-29 2015-07-22 南京大学 Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources
CN105373517A (en) * 2015-11-09 2016-03-02 南京大学 Spark-based distributed matrix inversion parallel operation method
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105701068A (en) * 2016-02-19 2016-06-22 南京大学 Cholesky matrix inversion system based on time division multiplexing technology
CN107203491A (en) * 2017-05-19 2017-09-26 电子科技大学 A kind of triangle systolic array architecture QR decomposers for FPGA
CN107341133A (en) * 2017-06-24 2017-11-10 中国人民解放军信息工程大学 The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition
CN107368459A (en) * 2017-06-24 2017-11-21 中国人民解放军信息工程大学 The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication
CN108509386A (en) * 2018-04-19 2018-09-07 武汉轻工大学 The method and apparatus for generating reversible modal m matrix
CN108536651A (en) * 2018-04-19 2018-09-14 武汉轻工大学 The method and apparatus for generating reversible modal m matrix
CN108874744A (en) * 2017-05-08 2018-11-23 辉达公司 The broad sense of matrix product accumulating operation accelerates
CN109558567A (en) * 2018-11-06 2019-04-02 海南大学 The upper triangular portions storage device of self adjoint matrix and parallel read method
CN109614149A (en) * 2018-11-06 2019-04-12 海南大学 The upper triangular portions storage device of symmetrical matrix and parallel read method
CN109614582A (en) * 2018-11-06 2019-04-12 海南大学 The lower triangular portions storage device of self adjoint matrix and parallel read method
CN109635236A (en) * 2018-11-06 2019-04-16 海南大学 The lower triangular portions storage device of symmetrical matrix and parallel read method
CN109857982A (en) * 2018-11-06 2019-06-07 海南大学 The triangular portions storage device and parallel read method of symmetrical matrix
CN111538946A (en) * 2020-04-24 2020-08-14 合肥工业大学 Quick verification system for operation result
CN112445752A (en) * 2019-08-28 2021-03-05 上海华为技术有限公司 Matrix inversion device based on cholesky decomposition
CN113608219A (en) * 2021-07-22 2021-11-05 北京无线电测量研究所 System and method for realizing uniform azimuth sampling for multi-channel SAR
US11488664B2 (en) 2020-10-13 2022-11-01 International Business Machines Corporation Distributing device array currents across segment mirrors
CN116662730A (en) * 2023-08-02 2023-08-29 之江实验室 Cholesky decomposition calculation acceleration system based on FPGA
US11816482B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
TWI841330B (en) * 2023-03-31 2024-05-01 國立臺灣大學 Gaussian elimination computing system and gaussian elimination computing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005081443A1 (en) * 2004-02-25 2005-09-01 Ntt Docomo, Inc. Apparatus and method for sequence estimation using multiple-input multiple -output filtering
CN102546088A (en) * 2010-12-28 2012-07-04 电子科技大学 BD (block diagonalization) pre-coding method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005081443A1 (en) * 2004-02-25 2005-09-01 Ntt Docomo, Inc. Apparatus and method for sequence estimation using multiple-input multiple -output filtering
CN102546088A (en) * 2010-12-28 2012-07-04 电子科技大学 BD (block diagonalization) pre-coding method and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
彭玲: "一种下三角复矩阵求逆方法的IP设计与实现", 《电子测试》 *
林皓: "基于FPGA的矩阵运算实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
欧阳联渊: "《计算机数值计算方法》", 31 January 1997 *
熊洋: "下三角复矩阵求逆的ASIC设计及实现", 《微计算机信息》 *
罗瑜: "基于FPGA的除法器设计", 《计算机与数字工程》 *
邵仪: "基于FPGA的矩阵运算固化实现技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104360986A (en) * 2014-11-06 2015-02-18 江苏中兴微通信息科技有限公司 Realization method of parallelization matrix inversion hardware device
CN104360986B (en) * 2014-11-06 2017-07-25 江苏中兴微通信息科技有限公司 A kind of implementation method of parallelization matrix inversion hardware unit
CN104794002A (en) * 2014-12-29 2015-07-22 南京大学 Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources
CN104794002B (en) * 2014-12-29 2019-03-22 南京大学 A kind of multidiameter delay division methods and system
CN104536943A (en) * 2015-01-13 2015-04-22 江苏中兴微通信息科技有限公司 Low-division-quantity fixed point implementation method and device for matrix inversion
CN104536943B (en) * 2015-01-13 2017-08-29 江苏中兴微通信息科技有限公司 A kind of matrix inversion fixed-point implementation method and device of low division amount
CN105373517A (en) * 2015-11-09 2016-03-02 南京大学 Spark-based distributed matrix inversion parallel operation method
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105701068A (en) * 2016-02-19 2016-06-22 南京大学 Cholesky matrix inversion system based on time division multiplexing technology
CN105701068B (en) * 2016-02-19 2018-06-19 南京大学 Cholesky matrix inversion systems based on time-sharing multiplexing technology
US11816482B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11816481B2 (en) 2017-05-08 2023-11-14 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
CN108874744B (en) * 2017-05-08 2022-06-10 辉达公司 Processor, method and storage medium for performing matrix multiply-and-accumulate operations
US11797302B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
CN108874744A (en) * 2017-05-08 2018-11-23 辉达公司 The broad sense of matrix product accumulating operation accelerates
US11797301B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US11797303B2 (en) 2017-05-08 2023-10-24 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
CN107203491A (en) * 2017-05-19 2017-09-26 电子科技大学 A kind of triangle systolic array architecture QR decomposers for FPGA
CN107368459A (en) * 2017-06-24 2017-11-21 中国人民解放军信息工程大学 The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication
CN107368459B (en) * 2017-06-24 2021-01-22 中国人民解放军信息工程大学 Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication
CN107341133A (en) * 2017-06-24 2017-11-10 中国人民解放军信息工程大学 The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions LU Decomposition
CN108509386A (en) * 2018-04-19 2018-09-07 武汉轻工大学 The method and apparatus for generating reversible modal m matrix
CN108536651A (en) * 2018-04-19 2018-09-14 武汉轻工大学 The method and apparatus for generating reversible modal m matrix
CN108509386B (en) * 2018-04-19 2022-04-08 武汉轻工大学 Method and apparatus for generating reversible modulo m matrix
CN108536651B (en) * 2018-04-19 2022-04-05 武汉轻工大学 Method and apparatus for generating reversible modulo m matrix
CN109857982B (en) * 2018-11-06 2020-10-02 海南大学 Triangular part storage device of symmetric matrix and parallel reading method
CN109614149A (en) * 2018-11-06 2019-04-12 海南大学 The upper triangular portions storage device of symmetrical matrix and parallel read method
CN109857982A (en) * 2018-11-06 2019-06-07 海南大学 The triangular portions storage device and parallel read method of symmetrical matrix
CN109635236A (en) * 2018-11-06 2019-04-16 海南大学 The lower triangular portions storage device of symmetrical matrix and parallel read method
CN109635236B (en) * 2018-11-06 2020-08-21 海南大学 Lower triangular part storage device of symmetric matrix and parallel reading method
CN109614582A (en) * 2018-11-06 2019-04-12 海南大学 The lower triangular portions storage device of self adjoint matrix and parallel read method
CN109558567B (en) * 2018-11-06 2020-08-11 海南大学 Upper triangular part storage device of self-conjugate matrix and parallel reading method
CN109558567A (en) * 2018-11-06 2019-04-02 海南大学 The upper triangular portions storage device of self adjoint matrix and parallel read method
CN112445752A (en) * 2019-08-28 2021-03-05 上海华为技术有限公司 Matrix inversion device based on cholesky decomposition
CN112445752B (en) * 2019-08-28 2024-01-05 上海华为技术有限公司 Matrix inversion device based on Qiaohesky decomposition
CN111538946A (en) * 2020-04-24 2020-08-14 合肥工业大学 Quick verification system for operation result
US11488664B2 (en) 2020-10-13 2022-11-01 International Business Machines Corporation Distributing device array currents across segment mirrors
CN113608219A (en) * 2021-07-22 2021-11-05 北京无线电测量研究所 System and method for realizing uniform azimuth sampling for multi-channel SAR
CN113608219B (en) * 2021-07-22 2023-11-17 北京无线电测量研究所 Multichannel SAR-oriented azimuth uniform sampling realization system and method
TWI841330B (en) * 2023-03-31 2024-05-01 國立臺灣大學 Gaussian elimination computing system and gaussian elimination computing method
CN116662730B (en) * 2023-08-02 2023-10-20 之江实验室 Cholesky decomposition calculation acceleration system based on FPGA
CN116662730A (en) * 2023-08-02 2023-08-29 之江实验室 Cholesky decomposition calculation acceleration system based on FPGA

Similar Documents

Publication Publication Date Title
CN103927290A (en) Inverse operation method for lower triangle complex matrix with any order
CN105426345A (en) Matrix inverse operation method
CN102129420B (en) FPGA implementation device for solving least square problem based on Cholesky decomposition
CN102063411A (en) FFT/IFFT processor based on 802.11n
CN107341133B (en) Scheduling method of reconfigurable computing structure based on LU decomposition of arbitrary dimension matrix
CN103092560B (en) A kind of low-consumption multiplier based on Bypass technology
CN109284824B (en) Reconfigurable technology-based device for accelerating convolution and pooling operation
CN102135951B (en) FPGA (Field Programmable Gate Array) implementation method based on LS-SVM (Least Squares-Support Vector Machine) algorithm restructured at runtime
CN102298570A (en) Hybrid-radix fast Fourier transform (FFT)/inverse fast Fourier transform (IFFT) implementation device with variable counts and method thereof
Kono et al. Scalability analysis of tightly-coupled FPGA-cluster for lattice boltzmann computation
CN103970720A (en) Embedded reconfigurable system based on large-scale coarse granularity and processing method of system
CN110543939A (en) hardware acceleration implementation framework for convolutional neural network backward training based on FPGA
CN101763338A (en) Mixed base FFT/IFFT realization device with changeable points and method thereof
CN109508175A (en) The FPGA design of pseudorandom number generator based on fractional order chaos and Zu Chongzhi's algorithm
CN103984677A (en) Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN105227259A (en) A kind of M sequence walks abreast production method and device
CN104268124A (en) FFT (Fast Fourier Transform) implementing device and method
Song et al. A fine-grained parallel EMTP algorithm compatible to graphic processing units
CN107368459B (en) Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication
CN115034360A (en) Processing method and processing device for three-dimensional convolution neural network convolution layer
CN111275180B (en) Convolution operation structure for reducing data migration and power consumption of deep neural network
CN106873942B (en) The method that the MSD multiplication of structure amount computer calculates
CN102970545A (en) Static image compression method based on two-dimensional discrete wavelet transform algorithm
CN102004720B (en) Variable-length fast fourier transform circuit and implementation method
CN103902762A (en) Circuit structure for conducting least square equation solving according to positive definite symmetric matrices

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140716

WD01 Invention patent application deemed withdrawn after publication