CN112986944A - CUDA heterogeneous parallel acceleration-based radar MTI and MTD implementation method - Google Patents

CUDA heterogeneous parallel acceleration-based radar MTI and MTD implementation method Download PDF

Info

Publication number
CN112986944A
CN112986944A CN202110238579.5A CN202110238579A CN112986944A CN 112986944 A CN112986944 A CN 112986944A CN 202110238579 A CN202110238579 A CN 202110238579A CN 112986944 A CN112986944 A CN 112986944A
Authority
CN
China
Prior art keywords
mtd
gpu
mti
cuda
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110238579.5A
Other languages
Chinese (zh)
Other versions
CN112986944B (en
Inventor
贾宗衡
孙子棠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110238579.5A priority Critical patent/CN112986944B/en
Publication of CN112986944A publication Critical patent/CN112986944A/en
Application granted granted Critical
Publication of CN112986944B publication Critical patent/CN112986944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/41Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/41Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • G01S7/414Discriminating targets with respect to background clutter
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention relates to the technical field of radar signal processing, and provides a radar MTI and MTD implementation method based on CUDA heterogeneous parallel acceleration. The method comprises the following steps: setting radar signal processing parameter values in a CPU (Central processing Unit), and copying echo matrix data after pulse compression to a GPU (graphics processing Unit) video memory space; dividing the thread organization of the kernel function, and executing a secondary canceller MTI kernel function in a GPU; executing a matrix transfer kernel function in a GPU, finishing FFT parallel computation of a plurality of groups of Doppler channels by means of a CUFFT library function, finally executing the matrix transfer kernel function again to obtain an output result of an MTD parallel algorithm, and transmitting the output result back to the CPU; and optimizing the kernel functions of the MTI and the MTD by using a CUDA code optimization strategy, and drawing an optimized acceleration ratio curve. The optimized parallelization algorithm has the acceleration ratio reaching 142.66 times, can well meet the real-time property of radar signal processing, and is convenient to expand and transplant based on the development mode of a CUDA software system and a Visual Studio platform.

Description

CUDA heterogeneous parallel acceleration-based radar MTI and MTD implementation method
Technical Field
The invention belongs to the technical field of radar signal processing, and particularly relates to a radar MTI and MTD implementation method based on CUDA heterogeneous parallel acceleration, aiming at ensuring the instantaneity of MTI and MTD algorithms when a radar processes large-scale echo data volume by utilizing GPU parallel computing capacity and a CUDA heterogeneous programming mode, and being easy for platform transplantation.
Background
During signal processing, the radar realizes clutter suppression by means of a moving target display (MTI) technology and a Moving Target Detection (MTD) technology. The MTI processing utilizes the characteristic that clutter is expressed as smaller Doppler frequency in a frequency domain relative to a radar detection object, and utilizes a digital canceller to cancel each distance unit one by one to filter out the static clutter and improve the signal-to-noise ratio. However, the MTI cannot obtain the doppler frequency of the moving object in advance, and needs to perform MTD to suppress clutter outside the echo band. MTD processing is typically implemented by concatenating a set of adjacent narrow-band doppler filter banks after the MTI filter, which are matched to the coherent echo burst. With the modern battlefield electromagnetic environment becoming increasingly complex, the serial processing of the CPU is very time-consuming due to the increasing echo data volume, and the real-time performance of radar signal processing in the current battlefield environment is difficult to meet.
The GPU is used as a core component of the graphics card, and a hardware architecture thereof has a high parallelism degree, and is superior to the CPU in parallel resource calculation. The CUDA is called a Unified Device Architecture (CUDA), and is used as a general parallel computing platform introduced by NVIDIA corporation, and supports heterogeneous cooperative work of a CPU and a GPU, and a programming model thereof fully combines CPU-skilled logic control and GPU-skilled parallel operation. There have been some research efforts currently directed to the MTI and MTD algorithms of the CUDA platform.
The great strength of the electronics science and technology university is that in the Master graduate thesis 'alert radar signal processing software design based on GPU', an MTI algorithm realized by GPU is provided. The method has similar theoretical connection with the MTI parallelization process in the invention, and the main steps are as follows: copying echo data after pulse compression from a CPU to a GPU in a first-in first-out storage mode; the second step is that: a primary canceller is used at a GPU end to realize two-pulse cancellation operation in a pulse repetition period; the third step: and returning the result after the MTI processing to the CPU for scheduling. The method successfully reduces the time consumption of MTI processing, but has the defects that the designed MTI filter has a narrow stopband notch and poor clutter suppression effect, and estimation errors are large due to lack of data precision unification.
In a patent of 'an implementation method of an external radiation source radar signal processing rapid radar based on a GPU' (application number: CN 201310176310.4; application publication number: CN103308897B) applied by institute of electronics of Chinese academy of sciences), an MTD algorithm based on the GPU is disclosed. The method mainly comprises the following steps: performing cross recombination on echo data to be processed, dividing the whole echo data into N data blocks with equal length, subdividing each data block into L data segments with equal length, wherein each data segment comprises M data point numbers; secondly, splicing the data with the same data segment number in different data blocks together in sequence, splicing the tail data of the ith segment of the Nth data block and the initial data of the (i + 1) th segment of the 1 st data block together to form a new storage structure; the third step: copying echo data under the new storage structure to a GPU, and starting M multiplied by N threads; the fourth step: and aiming at the spliced data, performing MTD processing on each M multiplied by N data points at a GPU end. The method effectively improves the parallelism degree of the MTD algorithm, but has some defects, such as optimization of thread allocation and delay hiding is not considered.
Disclosure of Invention
The invention provides a radar MTI and MTD implementation method based on CUDA heterogeneous parallel acceleration by utilizing GPU equipment, a CUDA software system and a programming model, and the process also comprises the optimization design of code instructions and thread structures, so that the radar signal processing speed can be greatly improved.
The technical idea of the invention is that a radar signal processing algorithm and a GPU parallel processing technology based on a CUDA acceleration platform are combined, a CPU + GPU heterogeneous programming mode is adopted to realize high-efficiency MTI and MTD parallel algorithms, and the high-efficiency MTI and MTD parallel algorithms integrally comprise a Host end program executed by a CPU and a Device end program executed by a GPU. The Host end is responsible for logic control and data management, and specifically comprises setting of simulation parameters, configuration of GPU thread layers, opening and releasing of storage units, reading of radar echo data, copying of data to the GPU and calling of kernel functions. The Device end is responsible for the specific execution process of the kernel function and the CUDA function library corresponding to the MTI and MTD parallel algorithm.
In order to achieve the purpose, the invention adopts the following technical scheme:
the method for realizing the radar MTI and MTD based on CUDA heterogeneous parallel acceleration comprises the following steps:
step 1, setting radar signal processing parameter values in a CPU, and reading N after pulse compressionr×NcD, maintaining an echo data matrix X, and copying the echo data matrix X as initial data before MTI processing to a well-developed GPU video memory one by one;
step 2, using a 2-dimensional thread index to allocate Grid (Grid) and thread Block (Block) sizes of a CUDA thread, executing a secondary canceller (MTI) kernel function in a GPU, and outputting echo data after static object and noise are filtered and a distance unit where a moving target is located;
step 3, the N obtained in the step 2r×NcDimension result matrix XMTIFirstly executing a matrix transfer kernel function in a GPU, then executing a cuFFTExeC2C function in a CUFFT library to finish FFT parallel computation of a plurality of groups of Doppler channels, and finally executing the matrix transfer kernel function again to obtain N output by an MTD parallel algorithmr×NcDimension matrix XMTDIt is copied from the GPU back to the CPU.
And 4, optimizing the kernel functions of the MTI and the MTD respectively realized in the steps 2 and 3 by adopting strategies of code instruction optimization, optimal thread allocation, alignment and global memory access and the like, and calculating the acceleration ratio of the optimized CUDA heterogeneous parallel algorithm to the CPU serial algorithm.
In the radar MTI and MTD implementation method based on CUDA heterogeneous parallel acceleration, target echo data simultaneously contain distance based on time delay and speed dimension information based on Doppler frequency shift. Firstly, storing an initial data matrix on GPU equipment by opening up a video memory space, and distributing a thread model of a CUDA (compute unified device architecture) in a two-dimensional index mode; then, the parallelized MTI kernel function realized based on the principle of the quadratic canceller is executed in the GPU. Before MTD processing, firstly transposing an output matrix of the previous link to enable Doppler dimension data addresses to be continuous, then completing parallel FFT calculation of Doppler dimensions on a GPU, and finally transposing a matrix once to restore expected target echo data.
Compared with the prior art, the invention has the following advantages: firstly, all kernel functions are optimized according to a CUDA optimization strategy, so that the operation speed of signal processing is fully increased; secondly, when the operation precision and the acceleration effect are balanced, aiming at a function with low arithmetic intensity in engineering, the method uses a single-precision floating point number which is better in graphic and flexible architecture GPU, and has higher cost performance as a whole; thirdly, the invention is based on the CUDA software system and the Visual Studio platform development mode, has the characteristics of software and modularization, and is convenient for expansion and transplantation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for implementing a CUDA heterogeneous parallel computing radar MTI and MTD provided by the invention;
FIG. 2 is a schematic diagram of a secondary canceller implementing the MTI algorithm according to the present invention;
FIG. 3 is a schematic diagram of a narrow-band Doppler filter bank structure for implementing the MTD algorithm provided by the present invention;
FIG. 4 is a simulation diagram of the result verification of executing only MTD kernel in the GPU provided by the present invention;
FIG. 5 is a simulation diagram of result verification of executing an MTI and an MTD kernel function in sequence in the GPU provided by the present invention;
FIG. 6 is an acceleration ratio curve of the optimized CUDA heterogeneous parallel algorithm and the CPU serial algorithm provided by the invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The flow chart of the CUDA heterogeneous parallel MTI and MTD algorithm provided by the embodiment of the invention is shown in FIG. 1. Specifically, the method comprises the following steps:
step 1, setting radar signal processing parameter values in a CPU, and reading N after pulse compressionr×NcAnd (5) maintaining the echo data matrix X, and copying the echo data matrix X into a well-developed GPU video memory one by one as initial data before MTI processing.
Specifically, step 1 includes the following 2 sub-steps:
step 1.1, setting transmitting signal parameters in CPU, and compressing pulse to obtain Nr×NcAnd (5) maintaining the echo matrix X, and opening up a GPU video memory space by using a cudaMalloc function.
And step 1.2, copying each pulse-compressed echo data from a CPU to a GPU by using a cudammcmpy function and a cudammcmpy HostToDevice parameter, wherein each thread of the CUDA stores a current sampling point value and a value of a sampling point after passing through a delay line.
And 2, using a 2-dimensional thread index to allocate the Grid (Grid) and the thread Block (Block) of the CUDA thread, executing a secondary canceller (MTI) kernel function in the GPU, and outputting echo data after static object and noise are filtered and a distance unit where a moving target is located. Fig. 2 is a schematic diagram of a quadratic canceller for MTI processing, where the output signal y (t) is equal to the convolution of the impulse response h (t) with the input x (t), and the calculation formula is:
Y(t)=H(t)*X(t)=X(t)-2X(t-Tr)+X(t-2Tr)
having a transfer function of
H(z)=(1-z-1)2=1-2z-1+z-2
Specifically, step 2 includes the following 3 sub-steps:
and 2.1, dividing the Grid (Grid) and the thread Block (Block) sizes of the thread organization according to the length of the echo data copied to the GPU, wherein each thread on the GridDim.x dimension is responsible for finishing two subtraction operations of a group of three-pulse cancellation.
And 2.2, executing a secondary canceller MTI kernel function, and finishing two times of subtraction operations of sampling points of the same distance resolution unit in a pulse repetition period in the GPU by utilizing the thread index value.
Step 3, the N obtained in the step 2r×NcDimension result matrix XMTIFirstly executing a matrix transfer kernel function in a GPU, then executing a cuFFTExeC2C function in a CUFFT library to finish FFT parallel computation of a plurality of groups of Doppler channels, and finally executing the matrix transfer kernel function again to obtain N output by an MTD parallel algorithmr×NcDimension matrix XMTDIt is copied from the GPU back to the CPU. FIG. 3 is a schematic diagram of an MTD filter for forming an MTI cascaded FFT using a narrow-band Doppler filter bank, and the MTD filter has an amplitude-frequency characteristic of
Figure BDA0002961243290000051
Wherein N represents the number of target echo pulses, k represents the kth filter, TrIs the pulse repetition period. Each distance unit and N-1TrThe delay element covers the whole doppler frequency.
Specifically, step 3 includes the following 5 sub-steps:
and 3.1, dividing the sizes of grids (Grid) and thread blocks (Block) of the thread organization, processing distance dimensional data of a plurality of channels by using a Grid dim.x dimension, and processing Doppler dimensional data of the plurality of channels by using a Grid dim.y dimension.
And 3.2, aiming at the matrix data after the secondary cancellation of the MTI kernel function is executed, configuring the kernel function to map each Doppler channel data into a thread block, executing the matrix transfer kernel function in the GPU, and enabling the data addresses of the distance dimension and the Doppler dimension to be continuous.
And 3.3, creating a cuFFT handle, calling a CUDA library function cufftPlan2D to configure a 2-dimensional cuFFT plan, and executing a complex domain-to-complex domain FFT parallel algorithm by using a library function cufftExecC2C with a parameter of CUFFT _ FORWARD in the Doppler dimension.
Step 3.4, the matrix obtained after FFT parallel computation is executed with a matrix transfer kernel function again to obtain N output by the MTD parallel algorithmr×NcDimension matrix XMTD. Obtaining Doppler frequency shift according to a Doppler channel where a moving target is located, and solving formulas of radial velocity and velocity resolution of the Doppler frequency shift are respectively as follows:
Figure BDA0002961243290000052
Figure BDA0002961243290000061
where c is the speed of light, fcIs the carrier frequency, Δ fdIs the Doppler resolution, frIs the pulse repetition frequency, mtdFFTIs the number of FFT points selected for MTD.
And 3.5, copying the target echo data after MTD processing from the GPU to the CPU by using the cudammcmpy function and the cudammcmpy DeviceToHost parameter, calling the cuffDestroy function to destroy the cuFFT handle, and calling the free function and the cudaFree function to respectively release the memory resources occupied by the CPU and the GPU.
And 4, optimizing the kernel functions of the MTI and the MTD respectively realized in the steps 2 and 3 by adopting strategies of code instruction optimization, optimal thread allocation, alignment and global memory access and the like, and calculating the acceleration ratio of the optimized CUDA heterogeneous parallel algorithm to the CPU serial algorithm.
Specifically, step 4 includes the following 4 CUDA optimization strategies:
(1) and optimizing the code instruction. The invention replaces arithmetic operators in kernel functions with bit operators, such as binary left shift operator < replace multiplication operator x; as another example, with 2nWhen modular calculation is carried out, the operation is replaced by bitwise AND operation&(2n-1). Meanwhile, the suffix 'f' is added to all float type variables in the invention, and unnecessary time consumption caused by hidden double-to-float forced type conversion is eliminated.
(2) Optimal thread allocation. The invention configures the number of threads opened by a single thread block to be an integral multiple of 32 and not more than 1024. Aiming at int type data occupying 4 bytes, one thread block is always stored with 256 threads; for the float type data occupying 8 bytes, by always storing 128 threads in one thread block, the execution unit can be better recycled and the efficiency of CUDA instruction flow can be improved.
(3) And aligning the global memory access. The global memory transaction head address of the GPU equipment is integral multiple of the cache granularity, the memory access is realized by 32-byte L2 cache or 128-byte L1 cache, the global memory access is always aligned, and a part of bandwidth is saved.
(4) And merging the global memory access. The invention enables the thread bundles to start from the aligned memory address, all 32 threads in each thread bundle access a continuous memory block, the data used for transmission processing are all required by the thread bundles, the merging degree of the memory access is 100%, and the maximization of the memory throughput is facilitated.
The implementation method of the radar MTI and MTD based on the CUDA heterogeneous parallel acceleration provided by the invention is finished.
The effect of the present invention will be further explained with the simulation experiment.
1. Simulation conditions are as follows:
in the simulation experiment of the invention, the computer hardware and software environment are configured as follows: the GPU equipment is an NVIDIA GeForce GTX 1660Ti video card and is provided with 6GB video memory and 1536 CUDA cores; the CPU model is Intel (R) Core i7-9750H processor, 6 Core 12 threads, and the main frequency is 2.6 GHz; the operating system is a 64-bit Windows 10 professional edition; the heterogeneous parallel platform is CUDA Toolkit 10.2; the CUDA programming platform is Microsoft Visual Studio 2019; the algorithm verification platform was MATLAB R2020 a.
The simulation parameters of the invention are as follows: the linear frequency modulation signal is used as a radar emission signal, the bandwidth B of the modulation signal is 20MHz, the pulse width tau is 10us, and the pulse repetition period Tr100us, sampling frequency fsIs 100MHz, transmitter carrier frequency fcIs 10 GHz. Setting 3 moving targets to be detected in simulation, wherein the distance of the moving target is 1 and 3Km, and the speed is 250 m/s; the distance of the moving target 2 is 6Km, and the speed is 25 m/s; the distance of the moving target 3 is 4Km, and the speed is 75 m/s; finally, a stationary target is set at a distance of 1 Km.
The invention sets 1000 pulse repetition periods T emitted by radar every time in simulationrAs a data acquisition, the time required for the data acquisition is 100ms according to the sampling frequency fsCan obtain a pulse repetitionPeriod TrNumber of inner sampling points NsIs 104And (4) respectively. In CUDA programming, float type single-precision floating point numbers occupying 4 bytes are used, so that the size of data volume needing to be processed in one data acquisition time is 103·Ns·4Bytes/10242≈38.1MB。
2. Simulation content:
FIG. 4 is a simulation diagram of executing only MTD kernels in the GPU, and then loading the output data to the MATLAB platform for result verification.
Fig. 5 is a simulation diagram in which the MTI and MTD kernel functions are executed in sequence in the GPU, and then the output data is loaded to the MATLAB platform for result verification.
FIG. 6 is an acceleration ratio curve of the CUDA heterogeneous parallel algorithm and the CPU serial algorithm of MTI + MTD after the optimization strategy of step 4 is used.
3. And (3) simulation result analysis:
according to the simulation parameters of the invention, the maximum unambiguous distance R of the radar can be solvedmaxThe distance resolution Δ R is c/2B is 7.5 m. The invention adopts 32-point FFT to carry out MTD processing in the GPU, so that the Doppler resolution delta f d1/(PRI · 32), velocity resolution Δ V (c · Δ f)d)/(2·fc)≈4.7m/s。
In fig. 4 and 5, the x-axis represents distance in m; the y-axis represents velocity in m/s; the z-axis represents normalized amplitude in volts. As can be seen from fig. 4, in the case that only MTD processing is performed in the GPU and MTI processing is not performed, a total of 4 targets are detected, and the stationary target located at 1Km is not filtered out. As can be seen from fig. 5, under the condition that the GPU successively performs MTI and MTD processing, the stationary target is successfully filtered, and within the error allowable range, the MTD processing result output by the GPU corresponds to the simulation parameters of the 3 expected moving targets set by the present invention, which verifies the feasibility of the radar MTI and MTD implementation method based on CUDA heterogeneous parallel acceleration provided by the present invention.
In fig. 6, the horizontal axis represents the data size when the CPU or GPU is used for MTI + MTD processing; and the vertical axis is an acceleration ratio value obtained by dividing the average consumed time of the CPU serial algorithm by the average consumed time of the CUDA heterogeneous parallel algorithm. As can be seen from FIG. 6, when the processed data volume is large, the acceleration ratio tends to be saturated, the acceleration ratio of the optimized MTI and MTD parallelization algorithm is 142.66 times as a whole, the real-time performance of radar signal processing can be well met, and the method is based on the development modes of a CUDA software system and a Visual studio platform and is also beneficial to platform transplantation of the algorithm.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (4)

1. A radar MTI and MTD realization method based on CUDA heterogeneous parallel acceleration is characterized by comprising the following steps:
step 1, setting radar signal processing parameter values in a CPU, and reading N after pulse compressionr×NcD, maintaining an echo data matrix X, and copying the echo data matrix X as initial data before MTI processing to a well-developed GPU video memory one by one;
step 2, using a 2-dimensional thread index to allocate Grid (Grid) and thread Block (Block) sizes of a CUDA thread, executing a secondary canceller (MTI) kernel function in a GPU, and outputting echo data after static object and noise are filtered and a distance unit where a moving target is located;
step 3, the N obtained in the step 2r×NcDimension result matrix XMTIFirstly executing a matrix transfer kernel function in a GPU, then executing a cuFFTExeC2C function in a CUFFT library to finish FFT parallel computation of a plurality of groups of Doppler channels, and finally executing the matrix transfer kernel function again to obtain N output by an MTD parallel algorithmr×NcDimension matrix XMTDIt is copied from the GPU back to the CPU.
And 4, optimizing the kernel functions of the MTI and the MTD respectively realized in the steps 2 and 3 by adopting strategies of code instruction optimization, optimal thread allocation, alignment and global memory access and the like, and calculating the acceleration ratio of the optimized CUDA heterogeneous parallel algorithm to the CPU serial algorithm.
2. The method according to claim 1, characterized in that step 1 comprises in particular the following sub-steps:
step 1.1, setting transmitting signal parameters in CPU, and compressing pulse to obtain Nr×NcAnd (5) maintaining the echo matrix X, and opening up a GPU video memory space by using a cudaMalloc function.
And step 1.2, copying each pulse-compressed echo data from a CPU to a GPU by using a cudammcmpy function and a cudammcmpy HostToDevice parameter, wherein each thread of the CUDA stores a current sampling point value and a value of a sampling point after passing through a delay line.
3. The method according to claim 1, characterized in that step 2 comprises in particular the following sub-steps:
and 2.1, dividing the Grid (Grid) and the thread Block (Block) sizes of the thread organization according to the length of the echo data copied to the GPU, wherein each thread on the GridDim.x dimension is responsible for finishing two subtraction operations of a group of three-pulse cancellation.
And 2.2, executing a secondary canceller MTI kernel function, and finishing two times of subtraction operations of sampling points of the same distance resolution unit in a pulse repetition period in the GPU by utilizing the thread index value.
4. The method according to claim 1, characterized in that step 3 comprises in particular the following sub-steps:
and 3.1, dividing the sizes of grids (Grid) and thread blocks (Block) of the thread organization, processing distance dimensional data of a plurality of channels by using a Grid dim.x dimension, and processing Doppler dimensional data of the plurality of channels by using a Grid dim.y dimension.
And 3.2, aiming at the matrix data after the secondary cancellation of the MTI kernel function is executed, configuring the kernel function to map each Doppler channel data into a thread block, executing the matrix transfer kernel function in the GPU, and enabling the data addresses of the distance dimension and the Doppler dimension to be continuous.
And 3.3, creating a cuFFT handle, calling a CUDA library function cufftPlan2D to configure a 2-dimensional cuFFT plan, and executing a complex domain-to-complex domain FFT parallel algorithm by using a library function cufftExecC2C with a parameter of CUFFT _ FORWARD in the Doppler dimension.
Step 3.4, the matrix obtained after FFT parallel computation is executed with a matrix transfer kernel function again to obtain N output by the MTD parallel algorithmr×NcDimension matrix XMTD. Obtaining Doppler frequency shift according to a Doppler channel where a moving target is located, and solving formulas of radial velocity and velocity resolution of the Doppler frequency shift are respectively as follows:
Figure FDA0002961243280000021
Figure FDA0002961243280000022
where c is the speed of light, fcIs the carrier frequency, Δ fdIs the Doppler resolution, frIs the pulse repetition frequency, mtdFFTIs the number of FFT points selected for MTD.
And 3.5, copying the target echo data after MTD processing from the GPU to the CPU by using the cudammcmpy function and the cudammcmpy DeviceToHost parameter, calling the cuffDestroy function to destroy the cuFFT handle, and calling the free function and the cudaFree function to respectively release the memory resources occupied by the CPU and the GPU.
CN202110238579.5A 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration Active CN112986944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110238579.5A CN112986944B (en) 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110238579.5A CN112986944B (en) 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration

Publications (2)

Publication Number Publication Date
CN112986944A true CN112986944A (en) 2021-06-18
CN112986944B CN112986944B (en) 2023-09-08

Family

ID=76352588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110238579.5A Active CN112986944B (en) 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration

Country Status (1)

Country Link
CN (1) CN112986944B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704520A (en) * 2021-10-27 2021-11-26 天津(滨海)人工智能军民融合创新中心 Method and device for accelerating Anchor-based data processing by using cuda in parallel and electronic equipment
CN116502028A (en) * 2023-04-28 2023-07-28 中国科学院软件研究所 Large-scale FFT (fast Fourier transform) implementation method and device based on floating point number compression technology
CN117152259A (en) * 2023-11-01 2023-12-01 常熟理工学院 Micro-assembly positioning acceleration method and system based on multichannel microscopic vision guidance
CN117687779A (en) * 2023-11-30 2024-03-12 山东诚泉信息科技有限责任公司 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104849698A (en) * 2015-05-21 2015-08-19 中国人民解放军海军工程大学 Radar signal parallel processing method and system based on heterogeneous multinucleated system
WO2018045566A1 (en) * 2016-09-09 2018-03-15 深圳大学 Random pulse doppler radar angle-doppler imaging method based on compressed sensing
CN110187962A (en) * 2019-04-26 2019-08-30 中国人民解放军战略支援部队信息工程大学 A kind of Gridding algorithm optimization method and device based on CUDA
CN110208752A (en) * 2019-06-27 2019-09-06 电子科技大学 A kind of radar MTI/MTD implementation method based on GPU

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104849698A (en) * 2015-05-21 2015-08-19 中国人民解放军海军工程大学 Radar signal parallel processing method and system based on heterogeneous multinucleated system
WO2018045566A1 (en) * 2016-09-09 2018-03-15 深圳大学 Random pulse doppler radar angle-doppler imaging method based on compressed sensing
CN110187962A (en) * 2019-04-26 2019-08-30 中国人民解放军战略支援部队信息工程大学 A kind of Gridding algorithm optimization method and device based on CUDA
CN110208752A (en) * 2019-06-27 2019-09-06 电子科技大学 A kind of radar MTI/MTD implementation method based on GPU

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田乾元;徐朝阳;赵泉;: "基于GPU的软件雷达信号处理", 舰船电子对抗, no. 01 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704520A (en) * 2021-10-27 2021-11-26 天津(滨海)人工智能军民融合创新中心 Method and device for accelerating Anchor-based data processing by using cuda in parallel and electronic equipment
CN116502028A (en) * 2023-04-28 2023-07-28 中国科学院软件研究所 Large-scale FFT (fast Fourier transform) implementation method and device based on floating point number compression technology
CN116502028B (en) * 2023-04-28 2023-10-20 中国科学院软件研究所 Large-scale FFT (fast Fourier transform) implementation method and device based on floating point number compression technology
CN117152259A (en) * 2023-11-01 2023-12-01 常熟理工学院 Micro-assembly positioning acceleration method and system based on multichannel microscopic vision guidance
CN117687779A (en) * 2023-11-30 2024-03-12 山东诚泉信息科技有限责任公司 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform
CN117687779B (en) * 2023-11-30 2024-04-26 山东诚泉信息科技有限责任公司 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform

Also Published As

Publication number Publication date
CN112986944B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN112986944A (en) CUDA heterogeneous parallel acceleration-based radar MTI and MTD implementation method
KR102385349B1 (en) Neural Network Instruction Set Architecture
JP2019537793A (en) Neural network calculation tile
US20190146796A1 (en) Uniform register file for improved resource utilization
CN111289975B (en) Rapid imaging processing system for multi-GPU parallel computing
CN109993293A (en) A kind of deep learning accelerator suitable for stack hourglass network
WO2016024508A1 (en) Multiprocessor device
CN112446471B (en) Convolution acceleration method based on heterogeneous many-core processor
Verhaegh et al. Efficiency improvements for force-directed scheduling
CN113407483B (en) Dynamic reconfigurable processor for data intensive application
CN113406572A (en) Radar parallel processing system and method, storage medium and terminal
CN114005458A (en) Voice noise reduction method and system based on pipeline architecture and storage medium
CN113359134A (en) SAR data distributed real-time imaging processing system and method based on embedded GPU
CN110208753B (en) GPU-based radar target echo signal acquisition method
CN103365821A (en) Address generator of heterogeneous multi-core processor
CN115951323A (en) Radar signal self-adaptive constant false alarm rate detection optimization method based on OpenCL
CN112732638B (en) Heterogeneous acceleration system and method based on CTPN network
CN115328440A (en) General sparse matrix multiplication implementation method and device based on 2D systolic array
Abdelrazek et al. A novel architecture using NVIDIA CUDA to speed up simulation of multi-path fast fading channels
US11640302B2 (en) SMID processing unit performing concurrent load/store and ALU operations
CN110750752A (en) Interpolation method and device for analog quantity data
CN117785480B (en) Processor, reduction calculation method and electronic equipment
CN118034785B (en) Instruction compression method, device, accelerator and storage medium
Yang et al. The distributed imaging processing method of space-borne SAR based on embedded GPU
CN115827215A (en) Empty signal processing modular design method based on GPU acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant