CN114020240A

CN114020240A - Time domain convolution computing device and method for realizing clock domain crossing based on FPGA

Info

Publication number: CN114020240A
Application number: CN202111304060.9A
Authority: CN
Inventors: 游斌相; 廖育富; 刘泽
Original assignee: Sichuan Jiuzhou ATC Technology Co Ltd
Current assignee: Sichuan Jiuzhou ATC Technology Co Ltd
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-02-08

Abstract

The invention discloses a device and a method for realizing cross-clock-domain time domain convolution calculation based on FPGA (field programmable gate array), comprising RAM _ coef, FIFO _ coef and FIFO _ conv which are realized based on FPGA; RAM _ coef and (M-1) FIFO _ coef form a pipeline structure which is transmitted in sequence according to signal time sequence, and RAM _ coef and (M-1) FIFO _ coef are respectively in one-to-one correspondence with the M multipliers and are used for respectively providing corresponding convolution coefficients for the time sequence signals of the M multipliers; the M-path multipliers and corresponding adders work in parallel, and (M-1) intermediate results of multiply-add operation are generated before the next signal arrives, and M FIFO _ conv are adopted to respectively store the intermediate results of the multiply-add operation of the M-path signals. The invention can meet the application requirements of high real-time performance and less output delay and has the characteristic of less resource consumption.

Description

Time domain convolution computing device and method for realizing clock domain crossing based on FPGA

Technical Field

The invention belongs to the technical field of programmable device application, and particularly relates to a device and a method for realizing time domain convolution calculation across clock domains based on an FPGA (field programmable gate array).

Background

Convolution is widely used in engineering and mathematics. Statistically, the weighted moving average is a convolution. In probability theory, the probability density function of the sum of two statistically independent variables X and Y is the convolution of the probability density functions of X and Y. In electronic engineering and signal processing, the output of any linear system can be obtained by convolving the input signal with a system function (the impulse response of the system).

The convolution operation process can be regarded as a multiply-add operation. If the convolution coefficient is long, the FPGA occupies more resources of the multiplier and the adder when performing the multiply-add operation in the time domain, thereby affecting the resource allocation and optimization of the whole system. The time domain convolution calculation performed by the FPGA is likely to encounter a clock domain crossing problem, that is, the clock domain of the input signal is not matched with the clock domain of the signal processing (convolution calculation), and at this time, the two clock domains need to be unified into one clock domain first, and then the convolution calculation is realized.

At present, the mainstream method for realizing convolution calculation on an FPGA is time domain convolution calculation, which specifically includes performing FFT operation on a signal and a convolution coefficient, then performing multiplication operation in a frequency domain, and finally returning to a time domain through ITTF calculation. The advantage of this approach is that it is not affected by the clock domain and saves multiplier resources. However, this method needs to be converted back and forth between the frequency domain and the time domain during the implementation process, which results in that the method needs to consume a lot of time to perform the time-frequency domain conversion, and the final result is that the method faces the problems of high delay and low real-time performance. Particularly, in the application occasions with extremely high requirements on the real-time performance of the operation, the convolution operation realized by adopting the frequency domain calculation method is difficult to meet the actual use requirements.

In addition, the FPGA time domain convolution calculation has the problems of occupying more multiplier and adder resources and the like

Disclosure of Invention

In order to solve the problems of high delay and low real-time performance of the conventional method for realizing convolution calculation on the FPGA, the invention provides a time domain convolution calculation device for realizing clock domain crossing based on the FPGA. The invention can meet the application requirements of high real-time performance and less output delay and has the characteristic of less resource consumption.

The invention is realized by the following technical scheme:

the device for realizing the time domain convolution calculation of the cross-clock domain based on the FPGA comprises a RAM _ coef, a FIFO _ coef and a FIFO _ conv which are realized based on the FPGA;

RAM _ coef is RAM for storing convolution coefficient in advance;

FIFO _ coef is (M-1) cache FIFOs for storing convolution coefficients;

FIFO _ conv is a buffer FIFO for storing the intermediate result of the multiply-add operation, and the number of the FIFO _ conv is M;

RAM _ coef and (M-1) FIFO _ coef form a pipeline structure which is transmitted in sequence according to signal time sequence, and RAM _ coef and (M-1) FIFO _ coef are respectively in one-to-one correspondence with the M multipliers and are used for respectively providing corresponding convolution coefficients for the time sequence signals of the M multipliers;

the M-path adders and the M-path multipliers are in one-to-one correspondence, the M-path multipliers and the corresponding adders work in parallel, and (M-1) intermediate results of multiplication and addition operations are generated before the next signal arrives, and M FIFO _ conv are adopted to respectively store the intermediate results of the multiplication and addition operations of the M-path signals.

Preferably, the apparatus of the present invention further comprises an input signal interface Sig _ din;

the input signal interface Sig _ din is used for receiving the timing signals and distributing the timing signals to the corresponding multipliers.

Preferably, when the Xth signal is input and X is more than or equal to 1 and less than or equal to M, the Xth signal is input into the Xth multiplier for operation;

when the X-th signal is input and M is more than X and less than or equal to 2M, just finishing all multiplication operations of the (X-M) -th signal, and at the moment, just enabling the (X-M) -th multiplier to be in an idle state and being capable of being used for calculating multiplication of the X-th signal;

the above steps are repeated in a circulating way, and the multiplexing of M multipliers is realized.

Preferably, the multiplication result output by the Xth multiplier of the present invention is added to the intermediate result of the multiplication and addition stored in the (X-1) th FIFO _ conv to obtain the intermediate result of the multiplication and addition by the Xth adder, and the first signal of the intermediate result of the multiplication and addition by the Xth adder is output as the convolution calculation result, and the rest of the signals are stored in the Xth FIFO _ conv (X), where X is greater than or equal to 1 and less than or equal to M.

Preferably, when the Xth signal is input and X is more than or equal to 1 and less than or equal to M, the Xth signal is subjected to multiply-add operation by using an Xth multiplier and an Xth adder to obtain a multiply-add operation result of the Xth signal, a first signal of the multiply-add operation result of the Xth signal is output as a convolution calculation result, and the rest signals are stored in an Xth FIFO _ conv (X);

when the Xth signal is input and M is more than X and less than or equal to 2M, just finishing the multiply-add operation of the (X-M) th signal, and reading and using the data in the (X-M) th FIFO _ conv (X) for storing the multiply-add operation result of the Xth signal;

the operation is repeated in a circulating way, and the multiplexing of M adders and FIFO _ conv is realized.

Preferably, the length of the (M-1) FIFO _ coef of the present invention is greater than N;

n is the ratio of the signal processing clock frequency to the input signal clock frequency.

Preferably, the method of the present invention comprises:

when a signal is input, circularly reading convolution coefficients from the RAM _ coef in sequence, multiplying the read convolution coefficients by signals 1, M +1 and 2M +1 … … to obtain an intermediate result mult1, and writing the coefficients into FIFO _ coef (1);

after N clocks, 2 … … signals of No. 2, M +2 and 2M +2 are input, at the moment, the coefficient is read out from the FIFO _ coef (1), the read coefficient is multiplied by the signals of No. 2, M +2 and 2M +2 … … to obtain an intermediate result mult2, and the coefficient is written into the FIFO _ coef (2);

and successively recursion backwards until the coefficient is read out from the last stage FIFO _ coef (M-1), and the read coefficient is multiplied by the M, 2M and 3M … … signals to obtain an intermediate result multM.

Preferably, the method of the present invention further comprises:

adding the multiplication result of the 1 st signal to the output of the M-th path FIFO _ conv (M) to obtain an intermediate result conv _ tmp (1) of the multiplication and addition operation of the 1 st signal, outputting the first signal in the conv _ tmp (1) as the convolution calculation result, and storing the rest signals in the 1 st path FIFO _ conv (1);

adding the multiplication result of the 2 nd signal to the output of the 1 st FIFO _ conv (1) to obtain an intermediate result conv _ tmp (2) of the multiplication and addition operation of the 2 nd signal, outputting the first signal in the conv _ tmp (2) as the convolution calculation result, and storing the rest signals in the 2 nd FIFO _ conv (2);

sequentially recursing backwards, adding the multiplication result of the Mth signal to the output of the (M-1) th FIFO _ conv (1) to obtain an intermediate result conv _ tmp (M) of the multiplication and addition operation of the Mth signal, outputting the first signal in the conv _ tmp (M) as a convolution calculation result, and storing the rest signals in the Mth FIFO _ conv (M);

when the (M +1) th signal is input, M times N clocks pass at this time, the multiplication and addition operation of the 1 st signal is just finished, the data in the FIFO _ conv (1) is read and used, the adder used by the 1 st signal is multiplexed by the addition operation of the signal at this time, and the intermediate result is stored in the 1 st FIFO _ conv (1);

the operation is repeated in a circulating way, and the multiplexing of M adders and M FIFO _ conv is realized.

In a third aspect, the invention provides a real-time data processing system, and the time domain convolution computing device based on the FPGA to realize clock domain crossing carries out convolution computation on a time sequence signal.

In a fourth aspect, the invention provides a radar data processing system, and the time domain convolution calculation device for realizing clock domain crossing based on the FPGA is adopted to carry out convolution calculation on radar signals.

The invention has the following advantages and beneficial effects:

1. compared with the frequency domain convolution calculation, the method can reduce time delay and improve real-time performance.

2. Compared with the time domain convolution calculation, the method can reduce the resource consumption of a multiplier, an adder and the like.

3. The invention can be widely popularized and used in application occasions with high real-time processing requirements and limited resources, and is particularly suitable for scenes such as data processing of a radar system.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic block diagram of the apparatus of the present invention.

FIG. 2 is a timing diagram of the relationship between the input signal and the convolution coefficients and the clock according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Examples

The embodiment provides a time domain convolution calculating device for realizing clock domain crossing based on an FPGA.

The convolution calculation principle is as follows:

in the formula, h [ n ] is convolution coefficient, x [ n ] is signal to be convolved. As can be seen from the formula:

1) the convolution calculation process is mainly multiplication and addition calculation.

2) Each signal to be convolved is multiplied by all coefficients.

3) The current output value is related to the historical input value.

In order to realize the multiplexing of the multiplier and the adder for the most times, the multiplication and the addition of one signal must be completed at the fastest speed, and after the multiplier and the adder complete the calculation of one signal, the multiplier and the adder can be used for completing the calculation of other signals. Meanwhile, in order to reduce the accumulation of intermediate calculation results as much as possible, the present embodiment designs the whole calculation process in a "pipeline" manner, so that the intermediate calculation results of the multiplier and the adder are always kept in an "optimal state" (the "optimal state" means that the current input signal only needs to complete one multiplication calculation and addition calculation with the current latest result to obtain the current convolution result, and the historical input multiplication and addition calculation result is not needed to be calculated again).

In this embodiment, if the clock frequency of the signal processing (convolution calculation) is N (N is a positive integer) times faster than the clock frequency of the input signal, the number of convolution coefficients is M (M is a positive integer) times of N, and if the coefficients are insufficient, zero padding is performed at the tail. Because the signal processing clock is faster than the input signal clock by N times, the input 1 signal can read out N coefficients to be multiplied and added, and the clock difference can be used for multiplexing the multiplier and the adder, thereby saving the resources of the multiplier and the adder by N times.

As shown in fig. 1, the apparatus of this embodiment includes RAM _ coef, FIFO _ coef, and FIFO _ conv implemented based on FPGA;

the RAM _ coef is a RAM storing convolution coefficients.

FIFO _ coef is a buffer FIFO storing convolution coefficients, and the number of FIFO _ coef is (M-1), namely FIFO _ coef (1), FIFO _ coef (2), FIFO _ coef …, and FIFO _ coef (M-1) shown in FIG. 1.

FIFO _ conv is a buffer FIFO storing the result of the multiply-add calculation, and there are M, that is, FIFO _ conv (1), FIFO _ coef (2), …, and FIFO _ coef (M) shown in fig. 1.

In fig. 1, Sig _ din is an input signal interface for receiving timing signals; 1. 2, … … M, M +1 and M +2 … … are input signal serial numbers; mult1 to multM are multiplication results; conv _ tmp is an intermediate multiplication and addition calculation result; FIFO _ conv is a buffer FIFO for storing the result of multiply-add calculation.

RAM _ coef and (M-1) FIFO _ coef form a structure (pipeline structure) which is transmitted in sequence according to signal time sequence, and RAM _ coef and (M-1) FIFO _ coef are respectively in one-to-one correspondence with the M multipliers and are used for respectively providing corresponding convolution coefficients for the time sequence signals of the M multipliers;

the M paths of time sequence signals correspond to the M paths of multipliers one by one, the M paths of multipliers and corresponding adders work in parallel, before the next signal arrives, (M-1) intermediate results are generated, and M FIFO _ conv are adopted to store the intermediate results of the multiplication and addition operation of the M paths of signals respectively.

Convolution coefficients are stored in 1 RAM, and (M-1) FIFOs are used for buffering the coefficients, so that M signals can be simultaneously multiplied. When the Xth signal (M is less than X and less than or equal to 2M) is input, all multiplication operations of the (X-M) th signal are just finished, and at the moment, the (X-M) th multiplier is just in an idle state and can be used for calculating multiplication of the Xth signal. The cyclic reciprocation realizes the continuity of convolution calculation and improves the multiplexing times of the multiplier to the maximum extent.

As shown in fig. 2, 1, 2, … … M, M +1, and M +2 … … are input signal numbers, each signal needs to be multiplied and added M × N times, and as can be seen from fig. 2, when the M +1 th signal is input, the multiplication and addition operation of the 1 st signal is just finished after M × N clocks. At this time, the multiplier and adder used by the 1 st signal are in idle state, and can be just multiplexed by the M +1 st signal. By analogy, multiplexing of M multipliers and adders can be realized, and resources of the N-times multiplier and the adder are saved.

In this embodiment, the convolution calculation is performed based on the device architecture shown in fig. 1, and the specific process includes:

before the signal comes, the convolution coefficients are stored in RAM _ coef, and (M-1) FIFO _ coef with the length larger than N is defined to buffer the convolution coefficients.

When a signal is input, convolution coefficients stored in the RAM _ coef in advance are sequentially and circularly read from the RAM _ coef, the read coefficients are multiplied by signals 1, M +1 and 2M +1 … … to obtain an intermediate result mult1, and the coefficients are written into the FIFO _ coef (1).

After N clocks, 2 … … signals of No. 2, M +2 and 2M +2 are input, at this time, the coefficient is read out from the FIFO _ coef (1), the read coefficient is multiplied by the signals of No. 2, M +2 and 2M +2 … … to obtain an intermediate result mult2, and the coefficient is written into the FIFO _ coef (2).

In the convolution calculation process, the output result of the current signal is: the multiplication result of the current signal is added with the multiplication results of the previous N _ coef-1 signals (N _ coef is the number of convolution coefficients). Since the signal processing clock is faster than the signal input clock and the M multipliers are working, M-1 intermediate results are generated before the next signal arrives, and therefore M FIFOs are defined for storing the intermediate results of the multiply-add operation.

Adding the multiplication result multX of the Xth signal to the multiplication and addition result of the (X-1) th signal to obtain an intermediate result conv _ tmp (X), outputting the first signal in the conv _ tmp (X) as a convolution calculation result, and storing the rest signals into the FIFO _ conv (X).

The method specifically comprises the following steps:

when X is equal to 1, that is, the multiplication result mult1 of the 1 st signal is added to the output (all zeros) of the FIFO _ conv (m), so as to obtain an intermediate result conv _ tmp (1), the first signal in conv _ tmp (1) is output as the convolution calculation result, and the rest signals are stored in the FIFO _ conv (1).

When X is 2, the multiplication result mult2 of the 2 nd signal is added with the multiplication and addition result of the 1 st signal (i.e. the output of FIFO _ conv (1)) to obtain an intermediate result conv _ tmp (2), the first signal in conv _ tmp (2) is output as the convolution calculation result, and the rest signals are written into FIFO _ conv (2);

and successively recurrently backwards, when X is equal to M, adding the multiplication result multM of the M-th signal to the multiplication and addition result of the (M-1) -th signal (namely the output of the FIFO _ conv (M-1)) to obtain an intermediate result conv _ tmp (M), outputting the first signal in conv _ tmp (M) as the convolution calculation result, and writing the rest signals into the FIFO _ conv (M).

When X is M +1, M × N clocks have elapsed, and the addition and multiplication of the 1 st signal are completed, the data in FIFO _ conv (1) (the intermediate result of the 1 st signal) is basically read and used, and the addition and calculation of the signal can reuse the adder used by the 1 st signal, and the result can be stored in FIFO _ conv (1). The cyclic reciprocating way not only realizes the multiplexing of the adder, but also realizes the multiplexing of FIFO, and greatly improves the utilization efficiency of the internal resources of the FPGA.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The device for realizing the time domain convolution calculation of the cross-clock domain based on the FPGA is characterized by comprising a RAM _ coef, a FIFO _ coef and a FIFO _ conv which are realized based on the FPGA;

RAM _ coef is RAM for storing convolution coefficient in advance;

FIFO _ coef is (M-1) cache FIFOs for storing convolution coefficients;

2. The device of claim 1, further comprising an input signal interface Sig _ din;

3. The device for realizing time domain convolution calculation across clock domains based on the FPGA according to claim 1, wherein when the Xth signal is input and X is more than or equal to 1 and less than or equal to M, the Xth signal is input into the Xth multiplier for operation;

4. The device according to claim 1, wherein the multiplication result output by the X-th multiplier is added to the intermediate result of the multiplication and addition operation stored in the (X-1) th FIFO _ conv to obtain the intermediate result of the multiplication and addition operation of the X-th adder, the first signal of the intermediate result of the multiplication and addition operation of the X-th adder is output as the convolution calculation result, and the rest of the signals are stored in the X-th FIFO _ conv (X), where 1 ≦ X ≦ M.

5. The device according to claim 1, wherein when the xth signal is input and X is not less than 1 and not more than M, the xth signal is multiplied and added by an xth multiplier and an xth adder to obtain the result of the multiply-add operation of the xth signal, the first signal of the result of the multiply-add operation of the xth signal is output as the result of the convolution operation, and the rest of the signals are stored in an xth FIFO _ conv (X);

6. The apparatus according to claim 1, wherein (M-1) FIFO _ coef lengths are greater than N;

7. The method for implementing the device for time-domain convolution calculation across clock domains based on the FPGA according to any one of claims 1 to 6, comprising:

8. The method of claim 7, further comprising:

9. A real-time data processing system, characterized in that the time domain convolution calculation device based on the FPGA to realize the cross-clock domain is adopted to carry out convolution calculation on the time sequence signal according to any one of claims 1 to 6.

10. A radar data processing system, characterized in that the convolution calculation is performed on radar signals by using the time domain convolution calculation device based on the FPGA to realize the clock domain crossing according to any one of claims 1 to 6.