CN108762719B

CN108762719B - Parallel generalized inner product reconstruction controller

Info

Publication number: CN108762719B
Application number: CN201810497969.2A
Authority: CN
Inventors: 李丽; 祁鹏展; 鲍贤亮; 宋文清; 李伟; 何书专; 潘红兵
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-05-21
Filing date: 2018-05-21
Publication date: 2023-06-06
Anticipated expiration: 2038-05-21
Also published as: CN108762719A

Abstract

The parallel generalized inner product reconstruction controller of the invention comprises: an intermediate result calculation module for receiving the source data and calculating an intermediate result vector based on the source data

Generating a vector

Storing the address of the bank; each time it is completed

A completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal; the final result calculation module is used for obtaining a result matrix by feeding the read data into a complex multiply accumulator to calculate the final result

The L th element

Generating a vector

Storing the address of the bank; and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal and generating a correct bank address signal. The beneficial effects are that: the method has the advantages of short calculation time and high storage resource utilization rate, and can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in a plurality of signal detection application scenes.

Description

Parallel generalized inner product reconstruction controller

Technical Field

The invention belongs to the technical field of non-uniform detection, and particularly relates to a parallel generalized inner product reconstruction controller.

Background

Space-time adaptive processing (STAP) is a detection technique for moving objects. In the conventional STAP algorithm, a clutter covariance matrix estimation must be performed. When the secondary data is used for estimating the clutter covariance matrix, the secondary data must meet the condition of independent same distribution to reduce the performance loss.

In practical applications, the detected signal echoes are not only contaminated by natural clutter, but also by artificial non-uniform interference, so that the independent co-distribution conditions are often not satisfied.

For an interference target in a sample, melvin first proposes the idea of a non-uniform detector (NHD) to suppress its effect on clutter covariance matrix estimation by rejecting samples containing the interference target. The basic idea of NHD is: and setting corresponding test statistics to distinguish the two samples according to the difference of the statistical properties of the sample polluted by the interference target and other samples.

Regarding NHD test statistic selection, gerlach et al, the United states naval laboratory, proposed two criteria, generalized Inner Product (GIP) and adaptive power remaining. Let X _L Representing the L-th sample of the initial samples, its corresponding autocorrelation matrix is expressed as:

wherein T is a noise covariance matrix, let ∈ ->

Representing a sample covariance matrix composed of L samples, the GIP value corresponding to each sample can be expressed as: />

According to the GIP value corresponding to each sample, the interference target can be effectively eliminated.

The clutter suppression capability of the generalized inner product non-uniform detection method is related to the number of samples, and the larger the number of samples is, the more true the clutter covariance matrix data is, and the stronger the clutter suppression capability is. The method for detecting the generalized inner product non-uniformity on the software has the problems of low precision and overlong operation time when a large number of samples are calculated, so that the high real-time requirement of the actual non-uniformity detection technology is met.

Disclosure of Invention

The invention aims to overcome the defects in the background technology, and provides a parallel generalized inner product reconstruction controller which better meets the requirements of high real-time performance and large point calculation of practical application, and is realized by the following technical scheme:

the parallel generalized inner product reconstruction controller comprises:

an intermediate result calculation module for receiving the source data and calculating an intermediate result vector Y based on the source data _L Generating a vector Y _L Storing the address of the bank; per completion one intermediate result vector Y _L A completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal;

the final result calculation module continuously generates columns X of matrix X through an address generator _L Address of element and corresponding intermediate result vector Y _L The address of the element, the read data enter a complex multiply accumulator to obtain a result matrix Z _1xN The L-th element Z _L Generating a vector Z _L Storing the address of the bank;

and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, and processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module to generate a correct bank address signal.

Further hardware implementation method of parallel generalized inner product operationThe design is that, calculate Y _L The process of (1) is X _L And a square matrix T, wherein the number of rows and columns of the square matrix T is equal to the number of columns of the matrix X, and each column is multiplied and accumulated by multiple paths of parallel calculation.

The hardware implementation method of the parallel generalized inner product operation is further designed in that the intermediate result calculation module is implemented by adopting a four-way parallel implementation mode.

The hardware implementation method of the parallel generalized inner product operation is further designed in that the source data storage mode of the intermediate result calculation module is as follows: the matrix T is stored in a bank0-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11.

The hardware implementation method of the parallel generalized inner product operation is further designed in that an intermediate result storage mode of the intermediate result calculation module is as follows: odd items are deposited in bank12 and even items are deposited in bank 13.

The hardware implementation method of the parallel generalized inner product operation is further designed in that the flow of intermediate result calculation performed by the intermediate result calculation module is as follows: in one operation, first the address generator generates a list of elements X of X _L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y _L The method comprises the steps of carrying out a first treatment on the surface of the Then the address generator generates an intermediate result storage address, and stores the intermediate result in the bank.

The hardware implementation method of the parallel generalized inner product operation is further designed in that the flow of the final result calculation performed by the final result calculation module is as follows: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the column X of the matrix X _L Address of element and corresponding intermediate result vector Y _L An address of the element; simultaneously input to a complex multiply accumulator to obtain a final result Z _L The final result storage address is generated by the address generator and the final result is stored in the bank.

The hardware implementation method of the parallel generalized inner product operation is further designed in that the complex multipliers are all pipelined single-precision floating point operation units delayed by 4 clock cycles, and the access delay of the complex multipliers is set to be 6 cycles.

The hardware implementation method of the parallel generalized inner product operation is further designed in that the number of the complex multiply accumulators is five, wherein four of the complex multiply accumulators are used for four-way parallel calculation of intermediate results, and the other complex multiply accumulators are used for synchronous calculation of final results.

The hardware implementation method of the parallel generalized inner product operation is further designed in that each complex multiply accumulator consists of a complex multiplier and three complex adders, and the DC synthesized area under the 40nm CMOS process is 19993.56 mu m ² 。

THE ADVANTAGES OF THE PRESENT INVENTION

The parallel generalized inner product reconstruction controller provided by the invention calculates Z by adopting a strategy of calculating an intermediate result and then immediately calculating a final result element _L-1 Can be hidden from the time of computing Y _L The calculation time is short and the utilization rate of storage resources is high. The parallel generalized inner product reconstruction controller can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in a plurality of signal detection application scenes.

Drawings

FIG. 1 is a schematic diagram of an architecture of a parallel generalized inner product reconstruction controller.

FIG. 2 is a schematic diagram of parallel generalized inner product data storage.

FIG. 3 is a schematic diagram of a parallel generalized inner product algorithm calculation flow.

Detailed Description

The invention is described in detail below with reference to the drawings and specific embodiments.

As shown in fig. 1, the parallel generalized inner product reconstruction controller of the present embodiment is configured by four ways of parallel operation, and mainly comprises three sub-modules, which are respectively: the device comprises an intermediate result calculation module, a final result calculation module and a data storage address processing module. The intermediate result calculation module is used for calculating an intermediate result; the final result calculation module calculates a final result; the data storage address processing module processes related signals such as bank addresses and the like.

Intermediate result calculating module for calculating intermediate result vector Y in complete pipeline _L Includes generating X _L Column element address, pair X _L A row of elements and a square matrix T _MxM Each row performs inner product multiply-accumulate operation to obtain an intermediate result vector Y _L Generating a vector Y _L Is stored in bank. Every time finish one Y _L Giving a completion signal to the final result calculation module as a start signal for its one calculation.

The final result calculation module continuously generates columns X of matrix X through an address generator _L Address of element and corresponding intermediate result vector Y _L The address of the element, the read data enter a complex multiply accumulator to obtain a result matrix Z _1xN The L-th element Z _L Generating a vector Z _L Is stored in bank.

And the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module, and generating signals such as correct bank addresses.

As shown in FIG. 1, the memory unit comprises 15 banks, wherein a matrix T is stored in banks 0-7, a matrix X is stored in banks 8-11, and an intermediate result Y _L Is stored in a bank12 and a bank13, and the final parallel generalized inner product matrix is stored in the bank 14. The arithmetic unit comprises 5 complex multiply accumulators, the complex multiply accumulators 0-3 are used for four-way parallel calculation of intermediate results, and the complex multiply accumulators 4 are used for simultaneous calculation of final results.

A schematic diagram of parallel generalized inner product data storage is shown in fig. 2. The source data storage mode is as follows: the matrix T is stored in a bank0-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11. So as to store and calculate the intermediate result Y _L 4 paths of parallel operation are performed, and the design of a corresponding DMA module can be simplified; intermediate result Y _L ，Y ₁ 、Y ₃ … and other odd items are deposited into bank12 (the latter covering the former), Y ₂ 、Y ₄ …, etc. are stored in the bank13 (the latter covers the former). Final generalized senseThe inner product matrix is stored in bank 14.

As shown in fig. 3, the flow of intermediate result calculation by the parallel generalized inner product algorithm is as follows: in one operation, first the address generator 1 generates a column of elements Y of X _L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y _L Then, an intermediate result storage address is generated by the address generator 2, and the intermediate result is stored in the bank.

Similarly, the flow of the final result calculation by the parallel generalized inner product algorithm is as follows: in one operation, when the module obtains the intermediate result calculation completion signal, the address generator 1 continuously generates the column X of the matrix X _L Address of element, and corresponding intermediate result vector Y _L The address of the element. Simultaneously input to a complex multiply accumulator to obtain a final result Z _L The final result storage address is then generated by the address generator 2 and the final result is stored in the bank.

The hardware implementation of the parallel generalized inner product algorithm of the invention comprises the following steps:

step 1) setting l=1, starting from the first column of matrix X;

step 2) calculating an intermediate result Y _L 。

Calculating intermediate result Y _L The method comprises the following steps:

step 2-1) sequentially taking X according to the address generated by the address generator sub-module _L Sum (T) ₁ T ₂ T ₃ T ₄ ) The elements are sent to a multiply-accumulate sub-module for complex multiply-accumulate operation to obtain (Y _L1 Y _L2 Y _L3 Y _L4 )；

Step 2-2) will (Y) according to the address generated by the address generator sub-module _L1 Y _L2 Y _L3 Y _L4 ) Sequentially writing into intermediate result bank while taking down a group of 4-column T matrix elements and X _L Repeating 1) and 2) until Y is completed _L Is calculated;

step 3) calculating the final result Z _L . With 1), 2) if Y has been generated _L-1 According to the groundThe addresses generated by the address generator are sequentially taken as X _L-1 And Y _L-1 Is subjected to complex multiply-accumulate to obtain Z _L-1 Writing the final result into a final result bank according to the address generated by the address generator;

step 4) if L < N, l=l+1, jumping to step two;

step 5) taking X sequentially _N And Y _N Is subjected to complex multiply-accumulate to obtain Z _N And (5) storing the result in a bank to finish the inner product operation.

The complex multipliers used in the parallel generalized inner product reconstruction controller of the embodiment are complex adders which are all running single-precision floating point operation units delayed by 4 clock cycles, access delay is 6 cycles, EDA simulation/synthesis tools are adopted, and the working main frequency reaches 1GHz.

The parallel generalized inner product reconstruction controller of this embodiment totals five complex multiply accumulators, four of which are used to compute intermediate results in four-way parallel and the other of which is used to compute final results in synchronization. Each complex multiply-accumulator consists of a complex multiplier and three complex adders, and the DC integrated area under 40nm CMOS process is 19993.56 μm ² 。

The parallel generalized inner product reconstruction controller of this embodiment calculates Z by employing a strategy that calculates an intermediate result and then immediately calculates a final result element _L-1 Can be hidden from the time of computing Y _L Compared with the method for calculating the final result in parallel after calculating the complete intermediate result, the method has the advantages of less calculation time and high storage resource utilization rate.

The parallel generalized inner product reconstruction controller of the embodiment has the characteristics of high calculation speed, flexible and variable points and high utilization rate of storage resources. The method can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in digital signal processing with large data volume, such as instant signal detection application scenes.

The present invention is not limited to the above-mentioned embodiments, and any changes or modifications within the technical scope of the present invention will be apparent to those skilled in the art. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The utility model provides a parallel generalized inner product reconstruction controller which characterized in that: comprising the following steps:

the final result calculation module continuously generates columns X of matrix X through an address generator _L Address of element and corresponding intermediate result vector Y _L The address of the element, the read data enter a complex multiply accumulator to calculate the final result to obtain a result matrix Z _1xN The L-th element Z _L Generating a vector Z _L Storing the address of the bank;

2. The parallel generalized inner product reconstruction controller according to claim 1, wherein: calculation of Y _L The process of (1) is X _L And a square matrix T, wherein the number of rows and columns of the square matrix T is equal to the number of columns of the matrix X, and each column is multiplied and accumulated by multiple paths of parallel calculation.

3. The parallel generalized inner product reconstruction controller according to claim 2, wherein: the intermediate result calculation module is realized by adopting a four-way parallel implementation mode.

4. A parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the source data storage mode of the intermediate result calculation module is as follows: the matrix T is stored in a bank O-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11.

5. A parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the intermediate result storage mode of the intermediate result calculation module is as follows: odd items are deposited in bank12 and even items are deposited in bank 13.

6. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the intermediate result calculation module performs the process of intermediate result calculation: in one operation, first the address generator generates a list of elements X of X _L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y _L The method comprises the steps of carrying out a first treatment on the surface of the Then the address generator generates an intermediate result storage address, and stores the intermediate result in the bank.

7. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the final result calculation module performs the following final result calculation process: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the column X of the matrix X _L Address of element and corresponding intermediate result vector Y _L An address of the element; simultaneously input to a complex multiply accumulator to obtain a final result Z _L The final result storage address is generated by the address generator and the final result is stored in the bank.

8. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the complex multiply accumulator is a pipeline single-precision floating point operation unit delayed by 4 clock cycles, and the memory access delay of the complex multiply accumulator is set to be 6 cycles.

9. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the complex multiply accumulator is five, four of which are used for four-way parallel computing intermediate results, and the other is used for synchronous computing final results.

10. The parallel generalized inner product reconstruction controller according to claim 1, wherein: each complex multiply-accumulator consists of a complex multiplier and three complex adders, and the DC integrated area under 40nm CMOS process is 19993.56 μm ² 。