CN112445752B - Matrix inversion device based on Qiaohesky decomposition - Google Patents

Matrix inversion device based on Qiaohesky decomposition Download PDF

Info

Publication number
CN112445752B
CN112445752B CN201910804096.XA CN201910804096A CN112445752B CN 112445752 B CN112445752 B CN 112445752B CN 201910804096 A CN201910804096 A CN 201910804096A CN 112445752 B CN112445752 B CN 112445752B
Authority
CN
China
Prior art keywords
unit
data
matrix
control unit
shift
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910804096.XA
Other languages
Chinese (zh)
Other versions
CN112445752A (en
Inventor
张应松
矫渊培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Huawei Technologies Co Ltd
Original Assignee
Shanghai Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Huawei Technologies Co Ltd filed Critical Shanghai Huawei Technologies Co Ltd
Priority to CN201910804096.XA priority Critical patent/CN112445752B/en
Priority to PCT/CN2020/086987 priority patent/WO2021036313A1/en
Publication of CN112445752A publication Critical patent/CN112445752A/en
Application granted granted Critical
Publication of CN112445752B publication Critical patent/CN112445752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7896Modular architectures, e.g. assembled from a number of identical packages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • G06F5/015Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising having at least two separately controlled shifting levels, e.g. using shifting matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computer Hardware Design (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a matrix inversion device based on George decomposition, which comprises a data writing control unit, a first data shifting unit, a control unit, an operation unit, a second data shifting unit, a storage unit and an output unit, wherein the operation unit comprises 8 single-precision complex multiply-add units (CMACs), the CMACs are provided with four-stage pipeline operation structures, the operation unit is connected with the control unit, the first data shifting unit, the control unit, the second data shifting unit and the output unit are respectively connected with the storage unit, the control unit is connected with the second data shifting unit, and the data writing control unit is connected with the first data shifting unit. According to the technical scheme, the problems that the internal computing resources of the current vector processor are fewer and the utilization rate of the computing resources is low are solved by using the 8 CMAC computing units, so that the decomposition inversion processing time delay based on the cholesky is reduced, and the network performance is improved.

Description

Matrix inversion device based on Qiaohesky decomposition
Technical Field
The invention relates to the field of digital signal processing, in particular to a matrix inversion device based on Georgi decomposition.
Background
Based on the Cholesky decomposition, the method is a common positive matrix inversion method, and the principle of matrix inversion is as follows: for an n-order symmetric positive definite matrix a, there is a lower triangular matrix L such that a=l×l T Then the inverse A of the positive definite matrix A -1 =(L*L T ) -1 =(L T ) -1 *L -1 =(L -1 ) T *L -1 . It is common practice in the industry to employ vector processors to implement a cholesky-based decomposition inversion.
In the matrix inversion operation process based on the cholesky decomposition, a large number of iterative and interleaving operations exist, and the internal calculation amount is very large. Vector processors rely primarily on their internal vector processing units to perform cholesky-based decomposition inversion. However, the current vector processor only includes 16 half-precision complex multiply add units (complex signal processor, CMAC), which is equivalent to 4 single-precision CMACs, i.e., ideally only 4 single-precision complex operations can be performed at a time, so that even if the utilization rate of computing resources can reach hundred percent, the processing capability is still weak. Meanwhile, when the vector processor performs the decomposition inversion based on the arbor, the vector processor performs arbor decomposition first, and performs inversion operation after all decomposition results are obtained. Because of the data dependence in the operation process, the decomposition process is that as iterations proceed, less CMAC is required, while the inversion is reversed, as iterations proceed, more CMAC is required. Whether decomposition or inversion, there are scenarios where CMAC utilization becomes better or worse as iteration progresses, i.e., CMAC average utilization is lower.
In summary, the current vector processor has fewer internal computing resources and lower utilization rate of computing resources, so that the decomposition inversion processing based on the cholesky is longer in time delay, which causes the decomposition inversion processing to be a bottleneck of a link easily, and affects network performance.
Disclosure of Invention
The embodiment of the invention provides a matrix inversion device based on arbor decomposition, which can reduce the time delay of the arbor decomposition inversion processing and improve the network performance.
The application provides a matrix inversion device based on arbor base decomposition, including data write-in control unit, first data shift unit, control unit, arithmetic unit, second data shift unit, memory cell and output unit, wherein, arithmetic unit includes 8 single precision complex multiplication unit CMACs, every CMAC possesses four-stage pipeline operation structure, arithmetic unit links to each other with control unit, first data shift unit, control unit, second data shift unit and output unit link to each other with memory cell respectively, control unit and second data shift unit interconnect, data write-in control unit links to each other with first data shift unit.
In the matrix inversion device based on the George decomposition, a data writing control unit is used for finishing writing control of a matrix, wherein the matrix is an N-order positive definite matrix, and N is an integer which is more than 1 and less than or equal to 32.
In the matrix inversion device based on the George decomposition, a first data shifting unit is used for shifting diagonal data of a matrix to the first bit of each column so as to obtain first shift data.
In the matrix inversion device based on the George decomposition, a control unit is used for communication and control among a storage unit, a second data shifting unit and an operation unit.
In the matrix inversion device based on the George decomposition, an operation unit is used for carrying out N times of parallel iterative operation on a matrix according to control information of a control unit so as to obtain an operation result of each parallel iterative operation in the N times of parallel iterative operation, wherein the operation result of the x-th time of parallel iterative operation is obtained according to the x-th column component data of an N-order positive definite matrix and the operation result of the previous (x-1) iterative operation, the operation result of the x-th time of parallel iterative operation comprises the column component data of the x-th column of a lower triangular matrix based on the George decomposition of the matrix and the row component data of the x-th row of an inverse matrix of the matrix, the column component data of the x-th column does not comprise diagonal line data of the lower triangular matrix, and x is an integer greater than 0 and less than or equal to N.
In the matrix inversion device based on the George decomposition, a second data shifting unit is used for carrying out data shifting on the operation result of each parallel iterative operation so as to obtain second shift data of the operation result of each iterative operation, and the second shift data is used for inputting the next parallel iterative operation.
In the matrix inversion device based on the George decomposition, a storage unit is used for storing first shift data and second shift data of operation results of each iteration operation. The memory cells are local caches, a maximum cacheable 32x32x64bit matrix.
In the matrix inversion device based on the George decomposition, the output unit is used for outputting an inverse matrix according to the second shift data of the operation result of each iteration operation stored in the storage unit.
For a positive definite matrix with n=8, the matrix inversion apparatus can support interleaving operations of 4 matrices simultaneously. For positive definite matrices with n=4, the matrix inversion apparatus can support interleaving operations of 8 matrices simultaneously. For a matrix of n=16, the matrix inversion means can support interleaving operations for 2 matrices simultaneously. For a positive definite matrix of n=32, no interleaving operation of the matrix is required. When N is not an integer multiple of 8, a largely close approach may be employed, and the interleaving manner coincides with that of a positive definite matrix when N is an integer multiple of 8.
The embodiment of the application provides a matrix inversion device based on arbor decomposition, which can solve the problems of less internal computing resources and lower utilization rate of computing resources of a current vector processor, thereby reducing the time delay of the arbor decomposition inversion processing and improving the network performance.
Drawings
Fig. 1 is a schematic diagram of an embodiment of a matrix inversion device based on a georgette decomposition according to an embodiment of the present application;
FIG. 2 (a) is a schematic diagram of a matrix format received by a data write control unit according to an embodiment of the present application;
FIG. 2 (b) is a schematic diagram of a matrix format output by the data write control unit according to the embodiment of the present application;
fig. 3 is a schematic diagram of first shift data after a first data shift unit shifts a matrix according to an embodiment of the present application;
fig. 4 (a) is a schematic diagram of data change in the process of performing a first parallel iterative operation in an operation unit by using a 4-group interleaved 8-order matrix provided in an embodiment of the present application;
fig. 4 (b) is a schematic diagram of data change in the process of performing a second parallel iterative operation on the 4-group interleaved 8-order matrix provided in the embodiment of the present application in the operation unit;
fig. 4 (c) is a schematic diagram of data change in the process of performing a third parallel iterative operation on the 4-group interleaved 8-order matrix provided in the embodiment of the present application in the operation unit;
fig. 5 is a schematic storage format of an operation result of each parallel iterative operation of the 4-group interleaved 8-order matrix according to the embodiment of the present application.
Detailed Description
The embodiment of the application provides a matrix inversion device based on arbor base decomposition, which can reduce the time delay of the arbor base decomposition inversion processing and improve the network performance.
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the matrix solving equation operation, the matrix inverse, for example, ax=b, is required to be solved, and a is required to be first solved -1 Then according to X=A -1 * B obtains the value of X.
The cholesky decomposition algorithm is a very common matrix decomposition method, and the basic principle is as follows: for an n-order symmetric positive definite matrix a, there is a lower triangular matrix L such that a=l×l T The numbers on the diagonal of the L matrix are all positive real numbers, L T The conjugate transpose of the lower triangular matrix L:
the basic formula of the cholesky decomposition algorithm is:
where j=1, 2, …, n; l (L) jj And l ij The initial values of (2) are:
finally according to A -1 =(L*L T )-1=(L T ) -1 *L -1 =(L -1 ) T *L -1 The inverse matrix of A is obtained.
The method for realizing matrix inversion based on the Georll Stroke decomposition by the vector processor mainly comprises the following two steps: first performing a decomposition operation, i.e. according to a=l×l T A lower triangular matrix L is obtained. Because of the dependency of the data, the vector processor must calculate the matrix a column by column in the process of performing the georgette decomposition, first calculate the first column, calculate the second column after the first column is calculated, and calculate each column by relying on the calculation results of all the previous columns. After the cholesky decomposition is completed to obtain the lower triangular matrix of matrix A, the inversion operation is performed, namely according to A -1 =(L -1 ) T *L -1 Calculation of A -1 . Because of the dependency relationship of the data, the vector processor also needs to calculate according to the row in the process of inverting the matrix A based on the decomposition result, and calculates the next row after the calculation of one row is completed.
Therefore, in the decomposition inversion process of the current vector processor, inversion operation is performed after all decomposition results are solved, so that the CMAC utilization rate is low and the CMAC computing resources of the vector processor are less.
Fig. 1 is a schematic diagram of an embodiment of a matrix inversion apparatus 10 based on a georgette decomposition according to an embodiment of the present application.
Referring to fig. 1, the matrix inversion apparatus 10 based on the georgette decomposition provided in the embodiment of the application includes a data writing control unit 101, a first data shifting unit 102, a control unit 103, an operation unit 104, a second data shifting unit 105, a storage unit 106, and an output unit 107, where the operation unit 104 includes 8 single-precision complex multiply-add units CMAC, the CMAC has a four-stage pipeline operation structure, the operation unit 104 is connected with the control unit 103, the first data shifting unit 102, the control unit 103, the second data shifting unit 105, and the output unit 107 are respectively connected with the storage unit 106, the control unit 103 is connected with the second data shifting unit 105, and the data writing control unit 101 is connected with the first data shifting unit 102.
The matrix inversion device 10 based on the georgette decomposition provided in the embodiment of the present application includes 8 single-precision CMACs, each CMAC has a four-stage running water operation structure, and can directly support the operation of the decomposition inversion of the N-order positive definite matrix with N less than or equal to 32, and when N is greater than 32, it can be disassembled into dimensions of 32 or less through software to calculate, which is not limited in the embodiment of the present application.
Since the matrix inversion apparatus 10 provided in the embodiment of the present application includes 8 single-precision CMACs, and each CMAC has a four-stage pipeline operation structure, for a positive definite matrix with n=8, the matrix inversion apparatus 10 can support interleaving operation of 4 matrices at the same time. For a positive definite matrix of n=4, the matrix inversion apparatus 10 can support interleaving operations of 8 matrices simultaneously. For a matrix of n=16, the matrix inversion means can support interleaving operations for 2 matrices simultaneously. For a positive definite matrix of n=32, no interleaving operation of the matrix is required. These several cases may enable a percentage utilization of the CMAC. When N is not an integer multiple of 8, the interleaving method may be a method of largely approaching, and the interleaving method may be a method of interleaving a positive definite matrix when N is an integer multiple of 8.
Specifically, in the embodiment of the present application, the functions of the respective functional units included in the matrix inversion apparatus 10 are as follows:
the data writing control unit 101 is configured to complete writing control of a matrix, and it should be noted that, in the embodiment of the present application, the matrix is an N-order positive definite matrix, and N is an integer greater than 1 and less than or equal to 32.
A first data shift unit 102 for shifting diagonal data of the matrix to the first bit of each column to obtain first shifted data.
And a control unit 103 for scheduling and controlling the whole calculation task.
The operation unit 104 is configured to perform N parallel iterative operations on the matrix according to the control signal of the control unit, so as to obtain an operation result of each parallel iterative operation in the N parallel iterative operations, where an operation result of an xth parallel iterative operation is obtained according to an xth column component data of an N-order positive definite matrix and an operation result of a previous (x-1) iterative operation, and the operation result of the xth parallel iterative operation includes an xth column component data of a lower triangular matrix based on a georgette decomposition and an xth row component data of an inverse matrix of the matrix, and x is an integer greater than 0 and less than or equal to N.
And a second data shift unit 105, configured to shift data of the operation result of each parallel iterative operation, so as to obtain second shift data of the operation result of each iterative operation, and use the second shift data for input of the next parallel iterative operation.
A storage unit 106, configured to store the first shift data and second shift data of an operation result of each iterative operation.
It should be noted that, in the embodiment of the present application, the storage unit 106 preferentially responds to the operation unit 104, and responds to the external input only when the operation unit 104 has no requirement; the memory unit 106 may also guarantee the number of matrices received, back-pressing the front stage when the internal buffer is full. It should be noted that, in the embodiment of the present application, the storage unit is a local cache, and the maximum is a 32×32×64bit matrix. Alternatively, the bandwidth of the memory unit 106 is 64×8=512 bits, which may be implemented by 128bits×128depth×4 bank.
An output unit 107, configured to output an inverse matrix of the matrix according to the second shift data of the operation result of each iterative operation stored in the storage unit 106.
Optionally, the embodiment of the present application specifically describes the functions of each unit module in the matrix inversion apparatus 10 in the embodiment of the present application in a manner of n=8, 4 sets of matrix interleaving.
In this embodiment of the present application, when n=8, and the operation is performed by using 4 groups of interleaving, the 4 matrices may be different 8-order positive definite matrices. The data writing control unit 101 first completes writing control of the matrix inputted from the outside, and the matrices received by the data writing control unit 101 are all transferred by row. As shown in fig. 2 (a), the matrices 1 to 4 are 4 matrices received by the data writing control unit 101 in rows. Specifically, the write control of the externally input matrix by the data write control unit 101 may refer to the conjugation of the matrix transferred by rows to implement column-row transposition of the matrix, and the matrix format output by the data write control unit 101 to the first data shift unit 102 is as shown in fig. 2 (b).
In this embodiment of the present application, according to the formula of the georgette decomposition, the diagonal data dii of the inverse matrix of the matrix needs to be first obtained according to the diagonal data of the matrix, and then the next decomposition inversion operation of each column can be continued, so that the operation at the beginning of each column must include the diagonal data. In order to solve the problem of addressing complexity caused by searching diagonal data in the subsequent operation process, the diagonal data of each column of each matrix is shifted to the first bit by the first data shifting unit 102, so as to obtain first shifted data, and the first shifted data is stored in the storage unit 106 to be used as input of the subsequent calculation.
In this embodiment, after receiving the matrix transmitted by the data writing control unit 101, the first data shifting unit 102 shifts the diagonal data of the matrix to the first bit of each column, thereby simplifying the addressing complexity of the subsequent calculation. The shifted data is first shift data, which is stored in the storage unit 106. For example, fig. 3 shows first shift data corresponding to each of the matrices obtained by the matrices 1 to 4 after being shifted by the first data shift unit 102.
In the embodiment of the present application, the control unit 103 completes communication among the operation unit 104, the second data shift unit 105, and the storage unit 106, performs address calculation of data in the storage unit 106, iterative control of each iterative operation in the operation unit 104, data shift control of the second data shift unit 105, and control of a plurality of matrix interleaving calculations. The overall computing task may have less sequential logic and more combinational logic.
For example, in the embodiment of the present application, when n=8, and the operation is performed by using 4 sets of interleaving, each matrix corresponds to 8 parallel iterative operations. Fig. 4 (a) -4 (b) show the variation of data during 8 parallel iterative operations in the arithmetic unit 104 for 4 sets of interleaved 8-order matrices.
As shown in fig. 4 (a), in the first parallel iterative operation process, the control unit 103 controls the input of the first column component data in the first shift data of each of the matrices 1-4 in the operation unit 104, and obtains the first parallel iterative operation result corresponding to each matrix through the operation of 8 CMACs, where the first parallel iterative operation result includes the first column vector data L10-L70 of the lower triangular matrix corresponding to each matrix and the first row vector data d00 of the inverse matrix.
After obtaining the first parallel iterative operation result, the control unit 103 controls the second data shift unit 105 to perform data shift on the first parallel iterative operation result to obtain second shift data corresponding to each matrix as input of the next iterative operation, as second shift data of the first parallel iterative result shown in fig. 4 (b).
Fig. 4 (b) shows that in the second parallel iterative operation process, the control unit 103 controls the input of the second column component data in the first shift data of each of the matrices 1 to 4 and the second shift data corresponding to the operation result of the first parallel iterative operation in the operation unit 104, and the second parallel iterative operation result corresponding to each of the matrices is obtained by the operation of 8 CMACs, and the second column vector data L21 to L71 of the lower triangular matrix corresponding to each of the matrices and the first row vector data d10 and d11 of the inverse matrix are included in the second parallel iterative operation result.
Correspondingly, after obtaining the second parallel iterative operation result, the control unit 103 controls the second data shift unit 105 to perform data shift on the second parallel iterative operation result to obtain second shift data corresponding to each matrix, as input of the next iterative operation, as second shift data of the second parallel iterative result shown in fig. 4 (c).
Fig. 4 (c) shows that in the third parallel iterative operation process, the control unit 103 controls the input of third column component data in the first shift data of each of the matrices 1 to 4 and second shift data corresponding to the operation results of the first and second parallel iterative operations, respectively, in the operation unit 104, and obtains the third parallel iterative operation result corresponding to each of the matrices by the operation of 8 CMACs, where the third parallel iterative operation result includes third column vector data L32 to L72 of the lower triangular matrix corresponding to each of the matrices, and first row vector data d20, d21, and d22 of the inverse matrix.
Correspondingly, after obtaining the third parallel iterative operation result, the control unit 103 controls the second data shift unit 105 to perform data shift on the third parallel iterative operation result to obtain second shift data corresponding to each matrix, as input of the next iterative operation, as second shift data of the third parallel iterative operation result shown in fig. 4 (c).
By analogy, until the eighth parallel iterative operation, the control unit 103 controls the input of the eighth column component data in the first shift data of each matrix of the matrices 1-4 and the second shift data corresponding to the operation result of the previous 7 parallel iterative operations in the operation unit 104, and obtains the operation result of the eighth parallel iterative operation corresponding to each matrix through the operation of 8 CMACs, where the operation result of the eighth parallel iterative operation includes the eighth row vector data d70-d77 of the inverse matrix corresponding to each matrix.
After the end of the eight parallel iterative operations, the storage format of the operation result of each parallel iterative operation stored in the storage unit 106 is shown in fig. 5. Finally, the output unit 107 reads the operation result of each parallel iterative operation in the storage unit 106, and outputs the operation result.
The functions of the respective unit modules in the matrix inversion apparatus 10 in the embodiment of the present application are specifically described above by taking the manner of n=8, 4 sets of matrix interleaving as an example. It should be understood that, for positive definite matrices where N is not equal to 8, the matrix inversion apparatus 10 in the embodiment of the present application may also perform matrix decomposition inversion by using the same principle. For example, the operation of n=4, 8 matrix interleaving, the operation of n=16, 2 matrix interleaving, the operation of n=32, the operation of no matrix interleaving, and the operation of N being not an integer multiple of 8, the interleaving mode being in close proximity to the interleaving mode of positive definite matrix when N is an integer multiple of 8, are all within the scope of protection of the present application.
The matrix inversion device provided by the embodiment of the application can solve the problems that the current vector processor is less in internal computing resources and the utilization rate of the computing resources is low, so that the decomposition inversion processing time delay based on the Qiaorse base can be reduced, and the network performance is improved.
It will be appreciated that the various numbers or letter designations referred to in the embodiments of the present application are merely descriptive convenience and are not intended to limit the scope of the embodiments of the present application. The sequence number of each process does not mean the sequence of the execution sequence, and the execution sequence of each process should be determined according to the function and the internal logic.
The matrix inversion device based on the cholesky decomposition provided in the embodiment of the present application is described in detail, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the description of the above examples is only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (6)

1. The matrix inversion device based on the George decomposition comprises a data writing control unit, a first data shifting unit, a control unit, an operation unit, a second data shifting unit, a storage unit and an output unit, and is characterized in that the operation unit comprises 8 single-precision complex multiply-add units (CMACs), the CMACs are provided with four-stage pipeline operation structures, the operation unit is connected with the control unit, the first data shifting unit, the control unit, the second data shifting unit and the output unit are respectively connected with the storage unit, the control unit is connected with the second data shifting unit, and the data writing control unit is connected with the first data shifting unit;
the data writing control unit is used for completing writing control of a matrix, wherein the matrix is an N-order positive definite matrix, and N is an integer which is more than 1 and less than or equal to 32;
the operation unit is configured to perform N parallel iterative operations on the matrix according to the control information of the control unit, so as to obtain an operation result of each parallel iterative operation in the N parallel iterative operations, where an operation result of an xth parallel iterative operation is obtained according to an xth column component data of the N-order positive definite matrix and an operation result of a previous (x-1) iterative operation, the operation result of the xth parallel iterative operation includes an xth column component data of a lower triangular matrix of the matrix based on a georgette decomposition and an xth row component data of an inverse matrix of the matrix, and the xth column component data does not include diagonal data of the lower triangular matrix, and x is an integer greater than 0 and less than or equal to N.
2. The apparatus of claim 1, wherein the first data shifting unit is configured to shift diagonal data of the matrix to a first bit of each column to obtain first shifted data;
the storage unit is used for storing the first shift data.
3. The apparatus according to claim 1 or 2, wherein the control unit is configured to communicate and control among the storage unit, the second data shift unit, and the arithmetic unit.
4. The apparatus according to claim 1, wherein the second data shift unit is configured to shift data of the operation result of each parallel iterative operation to obtain second shift data of the operation result of each iterative operation, where the second shift data is used for input of a next parallel iterative operation.
5. The apparatus of claim 4, wherein the storage unit is configured to store second shift data of an operation result of the each iterative operation.
6. The apparatus according to claim 5, wherein the output unit is configured to output the inverse matrix based on second shift data of the operation result of each iterative operation stored in the storage unit.
CN201910804096.XA 2019-08-28 2019-08-28 Matrix inversion device based on Qiaohesky decomposition Active CN112445752B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910804096.XA CN112445752B (en) 2019-08-28 2019-08-28 Matrix inversion device based on Qiaohesky decomposition
PCT/CN2020/086987 WO2021036313A1 (en) 2019-08-28 2020-04-26 Cholesky decomposition-based matrix inversion apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910804096.XA CN112445752B (en) 2019-08-28 2019-08-28 Matrix inversion device based on Qiaohesky decomposition

Publications (2)

Publication Number Publication Date
CN112445752A CN112445752A (en) 2021-03-05
CN112445752B true CN112445752B (en) 2024-01-05

Family

ID=74685533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910804096.XA Active CN112445752B (en) 2019-08-28 2019-08-28 Matrix inversion device based on Qiaohesky decomposition

Country Status (2)

Country Link
CN (1) CN112445752B (en)
WO (1) WO2021036313A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1783060A (en) * 2004-11-26 2006-06-07 北京天碁科技有限公司 Cholesky decomposition algorithm device
CN101825998A (en) * 2010-01-22 2010-09-08 北京龙芯中科技术服务中心有限公司 Instruction execution method for vector complex multiplication operation and corresponding device
US8775496B1 (en) * 2011-07-29 2014-07-08 Xilinx, Inc. Circuits and methods for calculating a cholesky decomposition of a matrix
CN103927290A (en) * 2014-04-18 2014-07-16 南京大学 Inverse operation method for lower triangle complex matrix with any order
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105701068A (en) * 2016-02-19 2016-06-22 南京大学 Cholesky matrix inversion system based on time division multiplexing technology

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8396914B1 (en) * 2009-09-11 2013-03-12 Altera Corporation Matrix decomposition in an integrated circuit device
CN108733627A (en) * 2018-04-30 2018-11-02 南京大学 A kind of FPGA implementation method that positive definite matrix Cholesky is decomposed
CN109635241B (en) * 2018-12-17 2023-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for solving symmetric or hermitian symmetric positive definite matrix inverse matrix

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1783060A (en) * 2004-11-26 2006-06-07 北京天碁科技有限公司 Cholesky decomposition algorithm device
CN101825998A (en) * 2010-01-22 2010-09-08 北京龙芯中科技术服务中心有限公司 Instruction execution method for vector complex multiplication operation and corresponding device
US8775496B1 (en) * 2011-07-29 2014-07-08 Xilinx, Inc. Circuits and methods for calculating a cholesky decomposition of a matrix
CN103927290A (en) * 2014-04-18 2014-07-16 南京大学 Inverse operation method for lower triangle complex matrix with any order
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105701068A (en) * 2016-02-19 2016-06-22 南京大学 Cholesky matrix inversion system based on time division multiplexing technology

Also Published As

Publication number Publication date
WO2021036313A1 (en) 2021-03-04
CN112445752A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
US20230244632A1 (en) Neural processing accelerator
CN110751277B (en) Arithmetic circuit, arithmetic device and system including the same
CN103970720B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
KR101162649B1 (en) A method of and apparatus for implementing fast orthogonal transforms of variable size
CN106846235B (en) Convolution optimization method and system accelerated by NVIDIA Kepler GPU assembly instruction
CN108897716B (en) Data processing device and method for reducing calculation amount through memory read-write operation
CN111723336B (en) Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode
US20230041850A1 (en) Adaptive matrix multiplication accelerator for machine learning and deep learning applications
EP4318275A1 (en) Matrix multiplier and method for controlling matrix multiplier
CN109902821B (en) Data processing method and device and related components
US11755320B2 (en) Compute array of a processor with mixed-precision numerical linear algebra support
CN112445752B (en) Matrix inversion device based on Qiaohesky decomposition
CN117539546A (en) Sparse matrix vector multiplication acceleration method and device based on non-empty column storage
US20040003201A1 (en) Division on an array processor
US9268744B2 (en) Parallel bit reversal devices and methods
CN109669666B (en) Multiply-accumulate processor
CN113128688B (en) General AI parallel reasoning acceleration structure and reasoning equipment
US9582473B1 (en) Instruction set to enable efficient implementation of fixed point fast fourier transform (FFT) algorithms
CN109343826B (en) Reconfigurable processor operation unit for deep learning
JP2022527318A (en) Data processing equipment and artificial intelligence chips
CN102611667A (en) Random access detection FFT/IFFT (Fast Fourier Transform Algorithm/Inverse Fast Fourier Transform) processing method and device
WO2020059156A1 (en) Data processing system, method, and program
US11669489B2 (en) Sparse systolic array design
US20240111525A1 (en) Multiplication hardware block with adaptive fidelity control system
US11379557B2 (en) Device and method for flexibly summing matrix values

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant