CN109446478B - Complex covariance matrix calculation system based on iteration and reconfigurable mode - Google Patents

Complex covariance matrix calculation system based on iteration and reconfigurable mode Download PDF

Info

Publication number
CN109446478B
CN109446478B CN201811284263.4A CN201811284263A CN109446478B CN 109446478 B CN109446478 B CN 109446478B CN 201811284263 A CN201811284263 A CN 201811284263A CN 109446478 B CN109446478 B CN 109446478B
Authority
CN
China
Prior art keywords
complex
covariance matrix
data
matrix
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811284263.4A
Other languages
Chinese (zh)
Other versions
CN109446478A (en
Inventor
李丽
陈辉
傅玉祥
陈沁雨
何国强
何书专
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201811284263.4A priority Critical patent/CN109446478B/en
Publication of CN109446478A publication Critical patent/CN109446478A/en
Application granted granted Critical
Publication of CN109446478B publication Critical patent/CN109446478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to a complex covariance matrix calculation system based on iteration and reconfigurable mode, comprising an on-chip SRAM memory, an off-chip DDR memory, a reconfigurable unit, a DMA controller and an acceleration core, wherein the acceleration core comprises: the matrix covariance operation module polls the source data of each area of the on-chip SRAM memory in an iterative computation mode and computes a lower triangular covariance matrix; the conjugate symmetry module obtains a complete complex covariance matrix from the lower triangular covariance matrix in an address mapping and reconstruction storage mode according to the conjugate symmetry property of the covariance matrix to form a final operation result; and the DMA interface function module is used for storing the data read from the off-chip DDR memory in a DMA mode into the on-chip SRAM memory in a partition mode. Has the advantages that: the invention supports the covariance operation of the complex matrix with any column number, and reduces the source data calculation amount and the time for writing the result data back to the DDR for many times in the traditional hardware implementation mode.

Description

Complex covariance matrix calculation system based on iteration and reconfigurable mode
Technical Field
The invention relates to the technical field of computers, in particular to a complex covariance matrix calculation system based on iteration and reconfigurable modes.
Background
The calculation of the covariance matrix is typical operation in the field of signal processing, is a key part for realizing multi-level nested wiener filters, spatial spectrum estimation, coherent source number estimation and affine invariant pattern recognition, and is widely applied to the fields of radar, sonar, digital image processing and the like. In addition, the covariance matrix is widely applied to the fields of image matching, image pattern recognition and the like, but the calculation process of the covariance matrix is complex, and if the covariance matrix of a multidimensional random variable composed of a plurality of eigenvectors in each region of an image needs to be calculated, the time consumption is long, and the calculation of the covariance matrix of the image on a personal universal PC platform in real time is also a great obstacle.
With the rapid development of the integrated circuit industry, high performance and real time are constantly pursued in the embedded field. Currently, most of the hardware implementation for the complex covariance matrix is designed based on platforms such as a DSP, a GPU, and an FPGA. The invention provides a hardware implementation method for computing a complex covariance matrix based on an iteration and reconfigurable mode based on a reconfigurable intelligent acceleration core, and compared with the traditional hardware implementation mode, the hardware implementation method has the advantages of high resource utilization rate and high hardware implementation speed. As typical operation in the field of signal processing, the hardware implementation method has good reference significance and wide application prospect.
In statistics and probability theory, the covariance matrix is a matrix. Each element of which is the covariance between the elements of the respective vector. Let X ═ X1,X2,X3,...,XN)TFor n-dimensional random variables, called matrices
Figure BDA0001847970200000011
Covariance matrix, denoted as d (X), for n-dimensional random variable X. Wherein, Cij=Cov(Xi,Xj) I, j is 1, …, n is the component X of XiAnd XjThe covariance of (a). Due to Cij=CjiTherefore, the covariance matrix is a symmetric matrix.
As shown in fig. 1, the conventional hardware implementation is as follows: for an mxn complex matrix a ═ aij]Its complex covariance matrix B may be composed ofThe following formula gives: b ═ AAH. Namely, firstly, the matrix A is transposed, and then conjugate is solved to obtain AHFinally, each element of the matrix B is obtained as follows:
Figure BDA0001847970200000012
because the method is realized according to matrix multiplication, when the covariance of a complex matrix with a large number of points is calculated, data is carried in and carried out for multiple times, which can cause the access and storage time to be overlong, and the realization method does not reduce the operation amount according to the conjugate symmetry property of the covariance matrix, which are reasons of long calculation time of the covariance matrix. In many practical application scenarios, the low-efficiency calculation of the covariance matrix can become a great obstacle, and the method has certain reference significance and application prospect.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a complex covariance matrix calculation system based on iteration and reconfigurable modes, which effectively reduces the calculation amount of source data, fully utilizes storage resources, accelerates the calculation speed, and further integrally improves the performance of an application algorithm, and is specifically realized by the following technical scheme:
the complex covariance matrix computing system based on iteration and reconfigurable mode comprises an on-chip SRAM (static random access memory), an off-chip DDR (double data rate) memory, a reconfigurable unit, a DMA (direct memory access) controller and an acceleration core, wherein the acceleration core is respectively in communication connection with the on-chip SRAM memory and the reconfigurable unit, the DMA controller is in communication connection with the reconfigurable unit, the off-chip DDR memory is in communication connection with the DMA controller through a bus, and the acceleration core comprises:
the matrix covariance operation module polls the source data of each area of the on-chip SRAM memory in an iterative computation mode and computes a lower triangular covariance matrix;
the conjugate symmetry module obtains a complete complex covariance matrix from the lower triangular covariance matrix in an address mapping and reconstruction storage mode according to the conjugate symmetry property of the covariance matrix to form a final operation result;
the DMA interface function module is used for storing data read from the off-chip DDR memory in a DMA mode into the on-chip SRAM memory in a partition mode; and writing the operation result back to the off-chip DDR memory in a DMA mode.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the on-chip SRAM memory sets storage resources to be divided into k banks, the depth of each bank is d, if M banks are allocated for storing a lower triangular covariance matrix, covariance operation is carried out on a complex matrix with the size of M multiplied by N, and the requirement of M multiplied by N is met2Md is less than or equal to md, and N is an arbitrary value; if the calculation parallelism is b, the condition of defining the complex matrix to be solved as the decimal point is as follows: m2Bd is less than or equal to, if the condition is not met, the complex matrix to be solved is judged to be a large point number.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that if the complex matrix to be solved is a small point number, a one-dimensional data transmission mode of DMA is adopted; and if the complex matrix to be solved is a large number of points, adopting a DMA two-dimensional data transmission mode.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module stores original data according to a rule that one column of a complex matrix to be solved is stored in a single bank, all banks in a single time are stored in one column, the current column is divided into one zone, all banks in a single time are divided into one section, and the total size of a partition in a single time, the total times of the partition in a single time, the number of the zones in the last section and the number of the columns in the last zone are calculated.
The complex covariance matrix computing system based on the iteration and reconfigurable mode is further designed in that a zero filling mechanism is adopted if any area column number is not filled; if the source data storage exceeds the single maximum storage point number, ping-pong operation segmented processing is adopted, and if the source data storage exceeds the single maximum storage point number, batch processing is adopted.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module adopts the reconfigurable mode to construct a complex multiplication accumulation calculation unit, and iterative calculation is carried out on each segment, each partition and each bank temporary storage result data to obtain a lower triangular covariance matrix.
The complex covariance matrix calculation system based on the iteration and the reconfigurable mode is further designed in that the using number of complex multipliers and complex adders in the complex multiplication accumulation calculation unit is equal to the bank number of source data storage, b is set, and the number of input data of each calculation unit is as follows: 2b + 1; reading the Mth address of all the source data banks as the input A of a complex multiplier; and respectively reading the conjugate of the data of the 1 st to M th addresses of all the source data banks as the input B of the complex multiplier, wherein the other input data is the value of the corresponding storage address in the last storage result bank, and the calculation result needs to be written back to the same address in the same bank again every time.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the conjugate symmetry module reads data from a bank where the lower triangular covariance matrix is located in sequence, analyzes rows and columns of the data in the lower triangular covariance matrix, and stores the data and conjugate symmetry data thereof into a multiplexed source data bank according to a rule that the rows and the columns of the matrix are stored in one bank until a complete complex covariance matrix is obtained.
The complex covariance matrix computing system based on the iteration and the reconfigurable mode is further designed in that when the acceleration core distributes data read from the DDR, the rows and the columns of the incoming data in the complex matrix to be solved are analyzed according to the addresses incoming by the DMA, and then the incoming data are stored in the corresponding source data bank according to the column distribution principle.
The complex covariance matrix calculation system based on iteration and reconfigurable mode is further designed in that the address mapping adopted by the conjugate symmetry module is as follows: and sequentially reading the data of the lower triangular covariance matrix from the result bank, resolving the row and the column of the original lower triangular covariance matrix according to the bank number of the data and the address in the bank, and storing the current data and the conjugate symmetric data thereof into the reconstructed bank according to the distribution principle that one column of the complete covariance matrix is stored in one bank.
The invention has the following advantages:
the complex covariance matrix calculation system based on the iteration and reconfigurable mode is designed in a parameterization mode, and a scheme with the maximum resource utilization rate is reconfigured for different storage and calculation resources to carry out complex covariance matrix operation. The method reduces the source data calculation amount, improves the resource utilization rate, accelerates the calculation speed of hardware realization, and provides a good reference for the design realization of the calculation covariance matrix in the field of signal processing.
Drawings
FIG. 1 is a hardware implementation architecture diagram of the computational complex covariance matrix of the present invention.
FIG. 2 is a diagram of a hardware implementation architecture for computing a complex covariance matrix in a conventional manner.
FIG. 3 is a schematic diagram of the arrangement of source data in the present invention.
FIG. 4 is a diagram illustrating a source data transport address mapping mechanism according to the present invention.
FIG. 5 is a flow chart of the storage of the dot source data in the present invention.
FIG. 6 is a schematic diagram of a computing unit design in the present invention.
FIG. 7 is a schematic diagram of a source data read of the computing process of the present invention.
FIG. 8 is a schematic diagram of a lower triangular matrix conjugate symmetric address mapping mechanism according to the present invention.
FIG. 9 is a comparison of performance evaluation for the computation of a complex covariance matrix according to the present invention and the conventional method.
Detailed Description
The following describes the present invention in detail with reference to the accompanying drawings.
Referring to fig. 1, the complex covariance matrix calculation system based on iterative and reconfigurable mode in this example mainly includes main modules such as DMA interface function, matrix covariance operation, and conjugate symmetry. The DMA interface function module is mainly responsible for: firstly, data read from an off-chip DDR memory (DDR) by a DMA mode is stored into an on-chip SRAM memory (SRAM) in a partitioning mode, and secondly, the final operation result is written back to the DDR by the DMA mode. The matrix covariance operation module is mainly responsible for polling source data of each region of the SRAM in an iterative computation mode to calculate a lower triangular covariance matrix. The conjugate symmetry module is mainly responsible for obtaining a complete complex covariance matrix by the lower triangular covariance matrix through address mapping and a reconstruction storage mode according to the conjugate symmetry property of the covariance matrix. The method supports the covariance operation of the complex matrix with any column number, reduces the source data calculation amount of the traditional hardware implementation mode and the time for writing back result data to DDR for many times, balances calculation and storage resources to realize maximized multipath parallelism, constructs the calculation unit by using a reconfigurable mode, and sets a specific address mapping rule to obtain the complex covariance matrix.
The on-chip SRAM memory sets storage resources to be divided into k banks, the depth of each bank is d, if M banks are allocated for storing a lower triangular covariance matrix, covariance operation is carried out on a complex matrix with the size of M multiplied by N, and the requirement of M is met2Md is less than or equal to md, and N is an arbitrary value; if the calculation parallelism is b, the condition of defining the complex matrix to be solved as the decimal point is as follows: m2Bd is less than or equal to, if the condition is not met, the complex matrix to be solved is judged to be a large point number.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that if the complex matrix to be solved is a small point number, a one-dimensional data transmission mode of DMA is adopted; and if the complex matrix to be solved is a large number of points, adopting a DMA two-dimensional data transmission mode.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module stores original data according to a rule that one column of a complex matrix to be solved is stored in a single bank, all banks in a single time are stored in one column, the current column is divided into one zone, all banks in a single time are divided into one section, and the total size of a partition in a single time, the total times of the partition in a single time, the number of the zones in the last section and the number of the columns in the last zone are calculated.
The complex covariance matrix computing system based on the iteration and reconfigurable mode is further designed in that a zero filling mechanism is adopted if any area column number is not filled; if the source data storage exceeds the single maximum storage point number, ping-pong operation segmented processing is adopted, and if the source data storage exceeds the single maximum storage point number, batch processing is adopted.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module adopts the reconfigurable mode to construct a complex multiplication accumulation calculation unit, and iterative calculation is carried out on each segment, each partition and each bank temporary storage result data to obtain a lower triangular covariance matrix.
The complex covariance matrix calculation system based on the iteration and the reconfigurable mode is further designed in that the using number of complex multipliers and complex adders in the complex multiplication accumulation calculation unit is equal to the bank number of source data storage, b is set, and the number of input data of each calculation unit is as follows: 2b + 1; reading the Mth address of all the source data banks as the input A of a complex multiplier; and respectively reading the conjugate of the data of the 1 st to M th addresses of all the source data banks as the input B of the complex multiplier, wherein the other input data is the value of the corresponding storage address in the last storage result bank, and the calculation result needs to be written back to the same address in the same bank again every time.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the conjugate symmetry module reads data from a bank where the lower triangular covariance matrix is located in sequence, analyzes rows and columns of the data in the lower triangular covariance matrix, and stores the data and conjugate symmetry data thereof into a multiplexed source data bank according to a rule that the rows and the columns of the matrix are stored in one bank until a complete complex covariance matrix is obtained.
The complex covariance matrix computing system based on the iteration and the reconfigurable mode is further designed in that when the acceleration core distributes data read from the DDR, the rows and the columns of the incoming data in the complex matrix to be solved are analyzed according to the addresses incoming by the DMA, and then the incoming data are stored in the corresponding source data bank according to the column distribution principle.
The complex covariance matrix calculation system based on iteration and reconfigurable mode is further designed in that the address mapping adopted by the conjugate symmetry module is as follows: and sequentially reading the data of the lower triangular covariance matrix from the result bank, resolving the row and the column of the original lower triangular covariance matrix according to the bank number of the data and the address in the bank, and storing the current data and the conjugate symmetric data thereof into the reconstructed bank according to the distribution principle that one column of the complete covariance matrix is stored in one bank.
The following is a detailed description of an example implementation of the invention, and a cycle accurate system level simulation model based on SystemC language is set up for verification.
The invention calculates a complex covariance matrix based on the hardware implementation architecture shown in FIG. 2, in the example, it is assumed that the matrix X is M × N (M is less than or equal to 256, N is less than or equal to 8K), and the generalized covariance result Y is M × M: (hereinafter, "%" indicates a modulo operation)
Figure BDA0001847970200000061
If the memory size is 2MB of SRAM as an example (divided into 32 banks, each bank depth is 8k), the data processing is carried out in a partitioning mode (large number of points and also segmentation) during hardware implementation, and the source data area is filled in a partitioning mode for multiple times according to columns. Considering supporting the ping-pong operation with large dot count, the generalized covariance result Y is 256 × 256 dot counts at most, i.e. the lower triangular matrix is 256 × 256 (256+1)/2 — 32896 dot counts at most, so at least 5 bank storages are required. Therefore, the available 32 banks remove 5 banks storing the calculation results, and all the remaining banks store the source data through the ping-pong operation, so that the number of banks storing the source data at a time is a floor [ (32-5)/2] ═ 13. (floor function: return the largest integer smaller than the parameter).
Since each bank is 8k deep and data is stored in each bank in a single row of M (M is 256 at most), B ═ floor [8k/M ] zones are defined, i.e. a banks of each zone can store a row data, and all banks can store C ═ B row data at most. (definition of large dot number: data size of complex matrix to be solved exceeds C x M)
Taking an M N matrix as an example (N ≦ C), the source data is arranged as shown in FIG. 3. If N > C, then according to the cycle, D ═ ceil (N/C) times are transmitted in total, and the ceil function: the smallest integer greater than or equal to the specified expression is returned. The number of banks occupied by the last zone is (N-1)% A +1, and the remaining banks of the zone are filled with 0.
The AXI bus data bit width adopted in the implementation of this example is 256 bits, and the DMA splits the AXI bus data into 4 data of 64 bits and transmits the data to the DMA interface function of the matrix covariance operation, and the function stores the data into the corresponding bank according to the address mapping mechanism, as shown in fig. 4. Therefore, when the large dot count is operated, the partition size of each section is preferably multiple of 4, so that the number of columns transmitted each time is just multiple of 4, 2 data cannot be written into the same bank at the same time, and the influence of delay memory on the overall operation speed is avoided; for the small dot count operation, since the DMA is transferred once and the number of columns is not necessarily exactly a multiple of 4, 2 data are written to the same bank at the same time, and at this time, the data need to be written in turn with a delay, as shown in fig. 5. For the last section of last area data transmission of large dot number operation, because the DMA adopts a two-dimensional data transmission mode, if the number of the remaining columns is not a multiple of 4, zero padding can be automatically carried out to be a multiple of 4, although zero padding columns are written into the banks, zero padding is just needed to be carried out on the last area, so that the method is not only irrelevant, but also shortens the number of zero padding columns and shortens the storage time of source data.
The reconfigurable computing unit needs to be reconfigured into a 27 complex data input and 5-level full-flow water complex multiplication accumulation unit, as shown in fig. 6, 13 complex multipliers and 13 complex adders are used in total.
According to the conjugate symmetry characteristic of the generalized covariance result, in order to shorten the operation time, only the lower triangle of the complex covariance matrix is calculated, and the upper triangle is expanded by adopting conjugate symmetry. The source data reading form in the calculation process is shown in fig. 7, and the specific operation steps are as follows:
1) the matrix is divided into D sections according to columns, 1 section of data is imported each time, and the source data bank is placed according to the form of a graph 3;
2) the following operation is performed for each area occupied by the current segment: (input Pre _ Region Data is Data of the corresponding position in the same area)
Reading the 1 st number from the A banks simultaneously as an input I1, and taking the conjugate of the 1 st number as an input I2;
reading the 2 nd number from the A banks simultaneously as an input I1, and sequentially reading the 1 st and 2 nd numbers to take a conjugate as an input I2;
reading the 3 rd number from the A banks simultaneously as an input I1, and sequentially reading the 1 st, 2 nd and 3 rd numbers to take conjugation as an input I2;
……;
reading the Mth number from the A banks simultaneously as an input I1, and sequentially reading the 1 st, the 2 nd and the … …, wherein the Mth number takes a conjugate as an input I2;
3) repeating the step 2) to finish the operation of all the areas of the current section;
4) repeating the step 2) and the step 3) to complete calculation of all the sections to obtain M (M +1)/2 results;
5) and expanding the M (M +1)/2 results into M results through conjugate symmetry and storing the M results into a new storage array again.
Since the source data storage area can be reused after the calculation is completed, a new storage array is constructed to store the covariance calculation results of M matrix/M matrix, in order to correspond to that the DMA transfers data with 256 bits each time, and in consideration of the convenience of taking out 4 data each time, the new storage array is planned as bank0-bank15, the calculated M matrix/M complex covariance matrix is stored in the new storage array in sequence according to columns, and a specific lower triangular matrix conjugate symmetric address mapping mechanism is shown in FIG. 8.
The performance of this example was evaluated as follows: 1) the number of the sections occupying the B areas is floor (N/C); 2) the number of zones occupied by the last stage is ceil ((N% C)/A); 3) taking the number of cycles of the input I1 as M during operation of each Region, and taking the number of cycles of the input I2 and the input Pre _ Region Data as M x (M +1)/2 in parallel; 4) the time of conjugate symmetric expansion of the lower triangular matrix is M (M + 1)/2; 5) the complex multiply accumulate unit calculates time as T1, and reads and stores data from the bank as T2 and T3, respectively. The total number of operation cycles of the complex covariance matrix is as follows: floor (N/C) [ (M +1)/2+ M) × B ] + ceil [ (N% C)/a ] × [ M × (M +1)/2+ M ] + M × (M +1)/2+ T1+ T2+ T3. While the conventional way calculates the number of cycles of the complex covariance matrix as follows: m N (M + 4)/4.
T1, T2, T3 are negligible because they are small relative to the total number of cycles. The performance pairs for the above 2 implementations are shown in fig. 9 when covariance operations are performed on complex matrices of different sizes. As is apparent from the figure, the cycle number of calculating the complex covariance matrix based on the iteration and the reconfigurable mode is greatly reduced compared with the traditional mode. The complex covariance matrix calculation system based on the iteration and reconfigurable mode supports complex matrixes with any column number to perform covariance calculation, reduces the source data calculation amount of the traditional hardware implementation mode and the time of writing back result data to DDR for multiple times, balances calculation and storage resources to realize maximized multipath parallelism, constructs a calculation unit by using the reconfigurable mode, sets a specific address mapping rule to obtain the complex covariance matrix, and greatly improves the resource utilization rate and the calculation speed of hardware implementation. As typical operation in the field of signal processing, the hardware implementation method has good reference significance and wide application prospect.

Claims (9)

1. A complex covariance matrix computing system based on iteration and reconfigurable mode comprises an on-chip SRAM (static random access memory), an off-chip DDR (double data rate) memory, a reconfigurable unit, a DMA (direct memory access) controller and an acceleration core, wherein the acceleration core is respectively in communication connection with the on-chip SRAM memory and the reconfigurable unit, the DMA controller is in communication connection with the reconfigurable unit, the off-chip DDR memory is in communication connection with the DMA controller through a bus, and the complex covariance matrix computing system is characterized in that the acceleration core comprises:
the matrix covariance operation module polls the source data of each area of the on-chip SRAM memory in an iterative computation mode and computes a lower triangular covariance matrix;
the conjugate symmetry module obtains a complete complex covariance matrix from the lower triangular covariance matrix in an address mapping and reconstruction storage mode according to the conjugate symmetry property of the covariance matrix to form a final operation result;
the DMA interface function module is used for storing data read from the off-chip DDR memory in a DMA mode into the on-chip SRAM memory in a partition mode; writing the operation result back to the off-chip DDR memory in a DMA mode;
the matrix covariance operation module stores original data according to a rule that one column of a complex matrix to be solved is stored in a single bank, all banks in a single time are stored in one column, a current column is divided into one zone, all banks in a single time are divided into one section, and the total size of the partitions in the single time, the total times of segmentation, the number of the zones in the last section and the number of the columns in the last zone are calculated.
2. The stack-based of claim 1The complex covariance matrix calculation system in a generation and reconstruction mode is characterized in that the on-chip SRAM memory sets storage resources to be divided into k banks, the depth of each bank is d, if M banks are allocated to store a lower triangular covariance matrix, covariance operation is carried out on a complex matrix with the size of M multiplied by N, and the requirement of M is met2Md is less than or equal to md, and N is an arbitrary value; if the calculation parallelism is b, the condition of defining the complex matrix to be solved as the decimal point is as follows: m2Bd is less than or equal to, if the condition is not met, the complex matrix to be solved is judged to be a large point number.
3. The iterative and reconfigurable complex covariance matrix calculation system according to claim 2, wherein if the complex matrix to be solved is a small number of dots, a one-dimensional data transmission mode of DMA is adopted; and if the complex matrix to be solved is a large number of points, adopting a DMA two-dimensional data transmission mode.
4. The iterative and reconfigurable complex covariance matrix computing system of claim 1, wherein if any number of columns is not filled, a zero padding mechanism is employed; if the source data storage exceeds the single maximum storage point number, ping-pong operation segmented processing is adopted, and if the source data storage exceeds the single maximum storage point number, batch processing is adopted.
5. The complex covariance matrix calculation system based on the iterative and reconfigurable mode as claimed in claim 1, wherein the matrix covariance operation module adopts the reconfigurable mode to construct a complex multiplication accumulation calculation unit, and performs iterative calculation on each segment, each partition and each bank temporary storage result data to obtain a lower triangular covariance matrix.
6. The iterative and reconfigurable complex covariance matrix calculation system according to claim 5, wherein the number of complex multipliers and complex adders in the complex multiply accumulate calculation unit is equal to the number of banks in the source data storage, and is set as b, and the number of data input by each calculation unit is: 2b + 1; reading the Mth address of all the source data banks as the input A of a complex multiplier; and respectively reading the conjugate of the data of the 1 st to M th addresses of all the source data banks as the input B of the complex multiplier, wherein the other input data is the value of the corresponding storage address in the last storage result bank, and the calculation result needs to be written back to the same address in the same bank again every time.
7. The iterative and reconfigurable complex covariance matrix calculation system according to claim 1, wherein the conjugate symmetry module sequentially reads data from a bank where the lower triangular covariance matrix is located, analyzes rows and columns of the data in the lower triangular covariance matrix, and stores the data and conjugate symmetry data thereof in a multiplexed source data bank according to a rule that the matrix columns are stored in one bank until a complete complex covariance matrix is obtained.
8. The iterative and reconfigurable complex covariance matrix calculation system according to claim 1, wherein when the acceleration core distributes data read from the DDR, the acceleration core resolves rows and columns of the incoming data in the complex matrix to be solved according to the address incoming from the DMA, and stores the resulting data in the corresponding source data bank according to a column-based distribution principle.
9. The iterative and reconfigurable complex covariance matrix computing system of claim 1, wherein the conjugate symmetry blocks are mapped at the addresses used as: and sequentially reading the data of the lower triangular covariance matrix from the result bank, resolving the row and the column of the original lower triangular covariance matrix according to the bank number of the data and the address in the bank, and storing the current data and the conjugate symmetric data thereof into the reconstructed bank according to the distribution principle that one column of the complete covariance matrix is stored in one bank.
CN201811284263.4A 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode Active CN109446478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811284263.4A CN109446478B (en) 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811284263.4A CN109446478B (en) 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode

Publications (2)

Publication Number Publication Date
CN109446478A CN109446478A (en) 2019-03-08
CN109446478B true CN109446478B (en) 2021-09-28

Family

ID=65550425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811284263.4A Active CN109446478B (en) 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode

Country Status (1)

Country Link
CN (1) CN109446478B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614582B (en) * 2018-11-06 2020-08-11 海南大学 Lower triangular part storage device of self-conjugate matrix and parallel reading method
CN111045965B (en) * 2019-10-25 2021-06-04 南京大学 Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method
CN111723336B (en) * 2020-06-01 2023-01-24 南京大学 Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002037259A1 (en) * 2000-11-01 2002-05-10 Bops, Inc. Methods and apparatus for efficient complex long multiplication and covariance matrix implementation
EP1215507A2 (en) * 2000-12-12 2002-06-19 Matsushita Electric Industrial Co., Ltd. Radio-wave arrival-direction estimating apparatus and directional variable transceiver
CN101211333A (en) * 2006-12-30 2008-07-02 北京邮电大学 Signal processing method, device and system
CN103685110A (en) * 2013-12-17 2014-03-26 京信通信系统(中国)有限公司 Predistortion processing method and system and predistortion factor arithmetic unit
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array
CN105893333A (en) * 2016-03-25 2016-08-24 合肥工业大学 Hardware circuit for calculating covariance matrix in MUSIC algorithm
US9686069B2 (en) * 2015-05-22 2017-06-20 ZTE Canada Inc. Adaptive MIMO signal demodulation using determinant of covariance matrix

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002037259A1 (en) * 2000-11-01 2002-05-10 Bops, Inc. Methods and apparatus for efficient complex long multiplication and covariance matrix implementation
EP1215507A2 (en) * 2000-12-12 2002-06-19 Matsushita Electric Industrial Co., Ltd. Radio-wave arrival-direction estimating apparatus and directional variable transceiver
CN101211333A (en) * 2006-12-30 2008-07-02 北京邮电大学 Signal processing method, device and system
CN103685110A (en) * 2013-12-17 2014-03-26 京信通信系统(中国)有限公司 Predistortion processing method and system and predistortion factor arithmetic unit
US9686069B2 (en) * 2015-05-22 2017-06-20 ZTE Canada Inc. Adaptive MIMO signal demodulation using determinant of covariance matrix
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array
CN105893333A (en) * 2016-03-25 2016-08-24 合肥工业大学 Hardware circuit for calculating covariance matrix in MUSIC algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种高精度的大点数二维FFT处理器设计;于东等;《现代雷达》;20160531;第38卷(第5期);第16-21页 *
基于数据阵共辘重构的MUSIC角估计算法;何子述等;《电子科技大学学报》;19990430;第28卷(第2期);第111-115页 *

Also Published As

Publication number Publication date
CN109446478A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN110097174B (en) Method, system and device for realizing convolutional neural network based on FPGA and row output priority
CN111758107B (en) System and method for hardware-based pooling
CN108416434B (en) Circuit structure for accelerating convolutional layer and full-connection layer of neural network
CN111242289B (en) Convolutional neural network acceleration system and method with expandable scale
CN109446478B (en) Complex covariance matrix calculation system based on iteration and reconfigurable mode
CN111667051A (en) Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN106021182B (en) A kind of row transposition architecture design method based on Two-dimensional FFT processor
CN108897716B (en) Data processing device and method for reducing calculation amount through memory read-write operation
US20230025068A1 (en) Hybrid machine learning architecture with neural processing unit and compute-in-memory processing elements
KR20230081697A (en) Method and apparatus for accelerating dilatational convolution calculation
Niu et al. SPEC2: Spectral sparse CNN accelerator on FPGAs
US20220188613A1 (en) Sgcnax: a scalable graph convolutional neural network accelerator with workload balancing
CN111610963B (en) Chip structure and multiply-add calculation engine thereof
CN115965067A (en) Neural network accelerator for ReRAM
WO2023279004A1 (en) Computation in memory architecture for depth-wise convolution
KR20240058084A (en) Parallel depth-specific processing architectures for neural networks
Alhussain et al. Hardware-efficient template-based deep CNNs accelerator design
WO2023019103A1 (en) Partial sum management and reconfigurable systolic flow architectures for in-memory computation
CN115424038A (en) Multi-scale image processing method, system and device and computer equipment
CN114912596A (en) Sparse convolution neural network-oriented multi-chip system and method thereof
Zhang et al. A High-Efficient and Configurable Hardware Accelerator for Convolutional Neural Network
CN116781484B (en) Data processing method, device, computer equipment and storage medium
US12056459B2 (en) Compute in memory architecture and dataflows for depth-wise separable convolution
CN110728367B (en) Data storage method and device for neural network
WO2023004570A1 (en) Activation buffer architecture for data-reuse in a neural network accelerator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant