CN108762719A

CN108762719A - A kind of parallel broad sense inner product reconfigurable controller

Info

Publication number: CN108762719A
Application number: CN201810497969.2A
Authority: CN
Inventors: 李丽; 祁鹏展; 鲍贤亮; 宋文清; 李伟; 何书专; 潘红兵
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-05-21
Filing date: 2018-05-21
Publication date: 2018-11-06
Anticipated expiration: 2038-05-21
Also published as: CN108762719B

Abstract

The parallel broad sense inner product reconfigurable controller of the present invention, including：Intermediate result computing module receives source data and calculates intermediate result vector according to source data, generate vectorAddress, be stored in bank；Often complete an intermediate result vectorCalculating generate a completion signal, and the completion signal is sent to final result computing module, as enabling signal；Final result computing module, reading data enter plural multiply-accumulator progress final result and matrix of consequence are calculatedL-th element, generate vectorAddress, be stored in bank；Address data memory processing module carries out data selection according to ping-pong operation selection signal, generates correct bank address signals.Advantageous effect：It is few and utilization ratio of storage resources is big to calculate the time, can meet when carrying out non-homogeneous detection in many signal detection application scenarios, obtain the high real-time requires of test statistics.

Description

A kind of parallel broad sense inner product reconfigurable controller

Technical field

The invention belongs to non-homogeneous detection technique field more particularly to a kind of parallel broad sense inner product reconfigurable controllers.

Background technology

It is a kind of detection technique to moving target that space-time adaptive, which handles (STAP),.In conventional STAP algorithms, it is necessary into Row clutter covariance matrix is estimated.When carrying out the estimation of clutter covariance matrix using secondary data, secondary data must expire The independent identically distributed condition of foot, could reduce performance loss.

In practical applications, detected signal echo can not only be polluted by natural clutter, be also suffered from artificial non- Uniformly interference is polluted, therefore is often unsatisfactory for independent same distribution condition.

For the jamming target in sample, Melvin first proposed the thought of nonhomogeneity detector (NHD), pass through rejecting The sample for including jamming target, come the influence for inhibiting it to estimate clutter covariance matrix.The basic ideas of NHD are：According to quilt The difference of the sample and other sample statistics characteristics of jamming target pollution, is arranged corresponding test statistics to distinguish two kinds of samples This.

In terms of NHD test statistics selections, US Naval Research Laboratory Gerlach et al. proposes broad sense inner product (GIP) With remaining two criterion of adaptive power.Enable X_LIndicate the l-th sample in initial sample, then its corresponding autocorrelation matrix table It is shown as：Wherein T is miscellaneous covariance matrix of making an uproar, and is enabledIndicate the sample covariance square being made of L sample Battle array, then the corresponding GIP values of each sample are represented by：It, can be with according to the corresponding GIP values of each sample Effectively reject jamming target.

The non-homogeneous detection method of broad sense inner product is related to the rejection ability of clutter and the population size of sample, and sample size is got over Greatly, clutter covariance matrix data are truer, stronger to the rejection ability of clutter.The non-homogeneous inspection of broad sense inner product is realized on software Survey method has that precision is not high and operation time is long when calculating great amount of samples, to meet practical non-homogeneous inspection The high real-time requires of survey technology.

Invention content

The purpose of the present invention is overcoming the shortcomings of in above-mentioned background technology, a kind of parallel broad sense inner product reconfigurable control is proposed Device preferably meets the high real-time of practical application, the demand that big points calculate, is realized especially by following technical scheme 's：

The parallel broad sense inner product reconfigurable controller includes：

Intermediate result computing module receives source data and calculates intermediate result vector Y according to source data_L, generate vector Y_L's Address is stored in bank；Often complete an intermediate result vector Y_LCalculating generate a completion signal, and by the completion signal It is sent to final result computing module, as enabling signal；

Final result computing module continuously generates the row X of matrix X by address generator_LThe address of element and it is corresponding in Between result vector Y_LThe address of element, reading data enter plural multiply-accumulator and obtain matrix of consequence Z_1xNL-th element Z_L, generate to Measure Z_LAddress, be stored in bank；

Address data memory processing module carries out data selection according to ping-pong operation selection signal, while to coming from centre As a result the signal for the same bank of computing module and final result computing module is handled, with generating correct bank Location signal.

The further design of the hardware implementation method of the parallel broad sense inner product operation is, calculates Y_LProcess be X_LWith Square formation T, each to arrange the process multiplied accumulating, the ranks number of the square formation T is equal with the columns of matrix X, and the process multiplied accumulating is logical Cross multichannel parallel computation realization.

The further design of the hardware implementation method of the parallel broad sense inner product operation is that intermediate result computing module is adopted It is realized with the parallel realization method in four tunnels.

The hardware implementation method of the parallel broad sense inner product operation it is further design be, intermediate result computing module Source data storage mode is：Matrix T is stored in by row in bank0-bank3, continues to deposit in bank4- by row after being filled with In bank7；Matrix X is stored in by row in bank8-bank11.

The hardware implementation method of the parallel broad sense inner product operation it is further design be, intermediate result computing module Intermediate result storage mode is：Odd term is stored in bank12, and even item is stored in bank13.

The hardware implementation method of the parallel broad sense inner product operation it is further design be, intermediate result computing module into Row intermediate result calculate flow be：During once-through operation, address generator generates a column element X of X first_LWith four row T matrix elements address, while corresponding matrix element data is carried, it inputs plural multiply-accumulator and obtains intermediate result Y_L；Then Intermediate result storage address is generated by address generator, intermediate result is stored in bank.

The hardware implementation method of the parallel broad sense inner product operation it is further design be, final result computing module into Row final result calculate flow be：When final result computing module, which obtains intermediate result, calculates completion signal, address generates Device continuously generates the row X of matrix X_LThe address of element and corresponding intermediate result vector Y_LThe address of element；It is input to complex multiplication simultaneously Accumulator obtains final result Z_L, final result storage address is generated by address generator, final result is stored in bank.

The further design of the hardware implementation method of the parallel broad sense inner product operation is that the complex multiplier is Postpone the flowing water single-precision floating point arithmetic element of 4 clock cycle, the memory access latency of complex multiplier is set as 6 periods.

The further design of the hardware implementation method of the parallel broad sense inner product operation is that the plural number multiply-accumulator is Five, wherein four are used for four tunnel parallel computation intermediate results, another calculates final result for synchronous.

The further design of the hardware implementation method of the parallel broad sense inner product operation is, each plural number multiply-accumulator by One complex multiplier and three complex adder compositions, the area that DC is integrated under 40nm CMOS technologies are 19993.56 μ m²。

Advantages of the present invention

Parallel broad sense inner product reconfigurable controller provided by the invention calculates one immediately after using one intermediate result of calculating The strategy of final result element calculates Z_L-1Time can be hidden in calculate Y_LTime in, calculate the time it is few and storage money Source utilization rate is big.The parallel broad sense inner product reconfigurable controller can meet carries out non-homogeneous inspection in many signal detection application scenarios When survey, the high real-time requires of test statistics are obtained.

Description of the drawings

Fig. 1 is the configuration diagram of parallel broad sense inner product reconfigurable controller.

Fig. 2 is that volume data stores schematic diagram in parallel broad sense.

Fig. 3 is parallel broad sense inner product algorithm calculation process schematic diagram.

Specific implementation mode

The present invention is described in detail with specific implementation case below in conjunction with the accompanying drawings.

Such as Fig. 1, the parallel broad sense inner product reconfigurable controller of the present embodiment is by taking four tunnels are parallel as an example, mainly by by three submodules Block forms, respectively：Intermediate result computing module, final result computing module and address data memory processing module.It is intermediate As a result computing module is for calculating intermediate result；Final result computing module calculates final result；Address data memory handles mould Block handles the coherent signals such as the addresses bank.

Intermediate result computing module, the calculating intermediate result vector Y of complete flowing water_L, including generate X_LColumn element address, it is right X_LOne column element and square formation T_MxMEach row carry out inner product and multiply accumulating operation, obtain intermediate result vector Y_L, generate vector Y_LGround Location is stored in bank.Often complete a Y_LCalculating provide one complete signal give final result computing module, as the primary of it The enabling signal of calculating.

Final result computing module continuously generates the row X of matrix X by address generator_LThe address of element and it is corresponding in Between result vector Y_LThe address of element, reading data enter plural multiply-accumulator and obtain matrix of consequence Z_1xNL-th element Z_L, generate to Measure Z_LAddress, be stored in bank.

Address data memory processing module carries out data selection according to ping-pong operation selection signal, while to coming from centre As a result the signal for the same bank of computing module and final result computing module is handled, with generating correct bank The signals such as location.

Such as Fig. 1, storage unit includes 15 bank, and wherein matrix T deposits in bank0-7, and matrix X deposits in bank8- 11, intermediate result Y_LIt is stored in bank12 and bank13, product matrix is stored in bank14 in final parallel broad sense.Operation Unit includes 5 plural multiply-accumulators, and plural multiply-accumulator 0-3 is used for four tunnel parallel computation intermediate results, plural multiply-accumulator 4 For calculating final result simultaneously.

It is that volume data stores schematic diagram in parallel broad sense as shown in Figure 2.Its source data storage mode is：Matrix T is deposited by row It is placed in bank0-bank3, continues to deposit in bank4-bank7 by row after being filled with；Matrix X is stored in bank8- by row In bank11.So storage is convenient for calculating intermediate result Y_L4 tunnel concurrent operations of Shi Jinhang, can also simplify corresponding dma module Design；Intermediate result Y_L, Y₁、Y₃... wait odd terms to be stored in bank12 (the latter covers the former), Y₂、Y₄... etc. even items deposit It is put into bank13 (the latter covers the former).Product matrix is stored in bank14 in final broad sense.

Such as Fig. 3, the flow that parallel broad sense inner product algorithm carries out intermediate result calculating is：During once-through operation, first Address generator 1 generates a column element X of X_LT matrix elements addresses are arranged with four, while carrying corresponding matrix element data, it is defeated Enter plural multiply-accumulator and obtains intermediate result Y_L, intermediate result storage address is then generated by address generator 2, by intermediate result It is stored in bank.

Similarly, the flow of parallel broad sense inner product algorithm progress final result calculating is：During once-through operation, when the mould When block obtains intermediate result calculating completion signal, address generator 1 continuously generates the row X of matrix X_LThe address of element, and it is corresponding Intermediate result vector Y_LThe address of element.Being input to plural multiply-accumulator obtains final result Z simultaneously_L, then by address generator 2 generate final result storage address, and final result is stored in bank.

Parallel broad sense inner product algorithm hardware realization of the present invention, which once completely calculates, to be included the following steps：

Step 1) sets L=1, is calculated since the first row of matrix X；

Step 2) calculates intermediate result Y_L。

Calculate intermediate result Y_LInclude the following steps：

Step 2-1) address that is generated according to address generator submodule, X is taken successively_L(T₁T₂T₃T₄) element be sent into Multiply accumulating submodule and carry out complex multiplication accumulating operation, obtains (Y_L1Y_L2Y_L3Y_L4)；

Step 2-2) address that is generated according to address generator submodule is by (Y_L1Y_L2Y_L3Y_L4) it is sequentially written in intermediate result In bank, while removing one group of 4 row T matrix element and X_L1) and 2), repeat, until completing Y_LCalculating；

Step 3) calculates final result Z_L.With 1), 2) it is synchronous carry out, if having generated Y_L-1, generated according to address generator Address take X successively_L-1And Y_L-1Element carry out plural number multiply accumulating, obtain Z_L-1, will according to the address that address generator is generated Final result is written in final result bank；

If step 4) L<N, L=L+1 jump to step 2,；

Step 5) takes X successively_NAnd Y_NElement carry out plural number multiply accumulating, obtain Z_N, it is stored in bank, completes inner product operation.

Used complex multiplier in the parallel broad sense inner product reconfigurable controller of the present embodiment, complex adder is to prolong The flowing water single-precision floating point arithmetic element of slow 4 clock cycle, memory access latency is 6 periods, using EDA emulation/synthesis tool, The dominant frequency that works reaches 1GHz.

The parallel broad sense inner product reconfigurable controller of the present embodiment, which amounts to, consumes five plural multiply-accumulators, wherein four are used for Four tunnel parallel computation intermediate results, another is used for synchronous calculating final result.Each plural number multiply-accumulator is by a complex multiplication Musical instruments used in a Buddhist or Taoist mass and three complex adders are constituted, and the area that DC is integrated under 40nm CMOS technologies is 19993.56 μm²。

The parallel broad sense inner product reconfigurable controller of the present embodiment calculates one most immediately after using one intermediate result of calculating The strategy of whole result element calculates Z_L-1Time can be hidden in calculate Y_LTime in, compared to calculating complete intermediate knot The method of parallel computation final result after fruit, the calculating time is few and utilization ratio of storage resources is high.

The characteristics of parallel broad sense inner product reconfigurable controller of the present embodiment is that calculating speed is fast, count flexibility and changeability and storage Resource utilization is high.Can meet the Digital Signal Processing larger in data volume, for example, in real-time signal detection application scenarios into When the non-homogeneous detection of row, the high real-time requires of test statistics are obtained.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Any one skilled in the art in the technical scope disclosed by the present invention, the variation or transformation that can be readily occurred in, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims Subject to.

Claims

1. a kind of parallel broad sense inner product reconfigurable controller, it is characterised in that：Including：

Intermediate result computing module receives source data and calculates intermediate result vector Y according to source data_L, generate vector Y_LGround Location is stored in bank；Often complete an intermediate result vector Y_LCalculating generate one completion signal, and by the completions signal hair It send to final result computing module, as enabling signal；

Final result computing module continuously generates the row X of matrix X by address generator_LThe address of element and corresponding intermediate result Vectorial Y_LThe address of element, reading data enter plural multiply-accumulator progress final result and matrix of consequence Z are calculated_1xNL-th member Plain Z_L, generate vector Z_LAddress, be stored in bank；

Address data memory processing module carries out data selection according to ping-pong operation selection signal, while to coming from intermediate result The signal for the same bank of computing module and final result computing module is handled, and the correct addresses bank letter is generated Number.

2. the hardware implementation method of parallel broad sense inner product operation according to claim 1, it is characterised in that：Calculate Y_LMistake Journey is X_LWith square formation T, each to arrange the process multiplied accumulating, the ranks number of the square formation T is equal with the columns of matrix X, this multiplies accumulating Process pass through multidiameter delay calculate realize.

3. parallel broad sense inner product reconfigurable controller according to claim 2, it is characterised in that：Intermediate result computing module is adopted It is realized with the parallel realization method in four tunnels.

4. parallel broad sense inner product reconfigurable controller according to claim 3, it is characterised in that：Intermediate result computing module Source data storage mode is：Matrix T is stored in by row in bank0-bank3, continues to deposit in bank4- by row after being filled with In bank7；Matrix X is stored in by row in bank8-bank11.

5. parallel broad sense inner product reconfigurable controller according to claim 3, it is characterised in that：Intermediate result computing module Intermediate result storage mode is：Odd term is stored in bank12, and even item is stored in bank13.

6. parallel broad sense inner product reconfigurable controller according to claim 1, it is characterised in that：Intermediate result computing module into Row intermediate result calculate flow be：During once-through operation, address generator generates a column element X of X first_LWith four row T matrix elements address, while corresponding matrix element data is carried, it inputs plural multiply-accumulator and obtains intermediate result Y_L；Then Intermediate result storage address is generated by address generator, intermediate result is stored in bank.

7. parallel broad sense inner product reconfigurable controller according to claim 1, it is characterised in that：Final result computing module into Row final result calculate flow be：When final result computing module, which obtains intermediate result, calculates completion signal, address generates Device continuously generates the row X of matrix X_LThe address of element and corresponding intermediate result vector Y_LThe address of element；It is input to complex multiplication simultaneously Accumulator obtains final result Z_L, final result storage address is generated by address generator, final result is stored in bank.

8. parallel broad sense inner product reconfigurable controller according to claim 1, it is characterised in that：The complex multiplier is Postpone the flowing water single-precision floating point arithmetic element of 4 clock cycle, the memory access latency of complex multiplier is set as 6 periods.

9. parallel broad sense inner product reconfigurable controller according to claim 1, it is characterised in that：It is described plural number multiply-accumulator be Five, wherein four are used for four tunnel parallel computation intermediate results, another calculates final result for synchronous.

10. parallel broad sense inner product reconfigurable controller according to claim 1, it is characterised in that：Each plural number multiply-accumulator It is made of a complex multiplier and three complex adders, the area that DC is integrated under 40nm CMOS technologies is 19993.56 μ m²。